Some cluster components cannot start properly with redundant ring configured in RHEL 6
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability or Resilient Storage Add On
- Redundant ring protocol(RRP) is configured in
/etc/cluster/cluster.conf, via the<altname> tag.
Issue
- Why was a node, after having its cluster services started and join the cluster domain successfully, not able to have the DLM be configured when using a redundant ring configuration?
Oct 18 08:49:58 node1 dlm_controld[43574]: dlm_join_lockspace no fence domain
Oct 18 08:49:58 node1 dlm_controld[43574]: process_uevent online@ error -1 errno 17
Oct 18 08:49:59 node1 kernel: dlm: clvmd: group join failed -1 -1
Oct 18 08:49:59 node1 clvmd: Unable to create DLM lockspace for CLVM: Operation not permitted
Oct 18 08:49:59 node1 kernel: dlm: Using SCTP for communications
Oct 18 08:49:59 node1 dlm_controld[43574]: dlm_join_lockspace no fence domain
Oct 18 08:49:59 node1 dlm_controld[43574]: process_uevent online@ error -1 errno 17
Oct 18 08:49:59 node1 kernel: dlm: rgmanager: group join failed -1 -1
fenced,rgmanager,GFS2,clvmd, and/orrgmanagerdo not start or function properly when usingdlmwith SCTP
Resolution
There are two options:
-
If DLM is required for a component besides
rgmanager, such asclvmd,cmirror, orGFS2then disable redundant ring functionality by removing the applicable RRP configuration entries including the<altmulticast> tags,<altname> tags, andrrp_mode="active"from /etc/cluster/cluster.conf. The following components require DLM and thus are not supported by Redundant ring protocol(RRP):clvmd,cmirror, GFS2. -
If the component having issues is
rgmanager, then: Usecpglockdwith an appropriate release ofrgmanager.
For more information on support policy with RRP and dlm, clvm, rgmanager, gfs2, and controld then see the following article: Support Policies for RHEL Resilient Storage - dlm General Policies
Root Cause
Redundant ring functionality in corosync is considered a Technology Preview prior to RHEL 6 Update 4, and is not recommended or supported for production usage on these earlier releases.
Even though DLM does offer the SCTP protocol as an option, it is not entirely functional and is not supported by Red Hat. This functionality exists because DLM can not function with multi-homing provided by RRP when using its normal TCP protocol, so the SCTP protocol was added to DLM as a way to handle this type of setup. However, that implementation is incomplete and is known to have issues, which means that effectively DLM is unusable with RRP.
The cpglockd daemon is able to provide an alternative lock manager for rgmanager in RHEL 6 Update 4 or later, but other daemons or components that require DLM will not function with RRP, and thus must have it disabled.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.