How can I activate cmirrord to carry out a pvmove in a RHEL 7 Resilient Storage cluster without having to restart my resources or applications?
Environment
- Red Hat Enterprise Linux (RHEL) 7 with the Resilient Storage Add-On
lvm2-clusterresource-agents- An
ocf:heartbeat:clvmresource configured in the CIB- Action is taken, or needs to be taken, to update the configuration attributes for that
clvmresource while it is active and in use
- Action is taken, or needs to be taken, to update the configuration attributes for that
Issue
- We need to
pvmovesome clustered volumes onto new storage, and thus need to enablecmirrord. How can we do this without restartingclvm,GFS2Filesystemresources, and other dependents? - If I update my
clvmresource to havewith_cmirrord, I get errors citing "unimplemented feature" and my resources are stopped. clvmseems to fail if you update its configuration on the-fly, referencing error 3
Jul 22 12:05:05 node1 pengine[19033]: notice: Reload clvmd:0#011(Started node2)
Jul 22 12:05:05 node1 pengine[19033]: notice: Reload clvmd:1#011(Started node1)
Jul 22 12:05:05 node1 crmd[19034]: notice: Initiating action 10: reload clvmd_reload_0 on node2
Jul 22 12:05:05 node1 crmd[19034]: notice: Initiating action 11: reload clvmd_reload_0 on node1 (local)
Jul 22 12:05:05 node1 pengine[19033]: notice: Calculated Transition 3278: /var/lib/pacemaker/pengine/pe-input-628.bz2
Jul 22 12:05:05 node1 crmd[19034]: warning: Action 10 (clvmd_reload_0) on node2 failed (target: 0 vs. rc: 3): Error
Jul 22 12:05:05 node1 crmd[19034]: notice: Transition aborted by clvmd_start_0 'modify' on node2: Event failed (magic=0:3;10:3278:0:f10202f4-55f3-48bb-8de6-dee0706281f6, cib=0.290.1, source=match_graph_event:381, 0)
Jul 22 12:05:05 node1 crmd[19034]: warning: Action 10 (clvmd_reload_0) on node2 failed (target: 0 vs. rc: 3): Error
Jul 22 12:05:05 node1 crmd[19034]: notice: Transition aborted by status-2-fail-count-clvmd, fail-count-clvmd=INFINITY: Transient attribute change (create cib=0.290.2, source=abort_unless_down:319, path=/cib/status/node_state[@id='2']/transient_attributes[@id='2']/instance_attributes[@id='status-2'], 0)
Jul 22 12:05:05 node1 crmd[19034]: notice: Operation clvmd_reload_0: unimplemented feature (node=node1, call=72, rc=3, cib-update=3351, confirmed=true)
Jul 22 12:05:05 node1 crmd[19034]: notice: node1-clvmd_reload_0:72 [ usage: /usr/lib/ocf/resource.d/heartbeat/clvm {start|stop|monitor|validate-all|meta-data}\n\nExpects to have a fully populated OCF RA-compliant environment set.\n ]
Jul 22 12:05:05 node1 crmd[19034]: warning: Action 11 (clvmd_reload_0) on node1 failed (target: 0 vs. rc: 3): Error
Jul 22 12:05:05 node1 crmd[19034]: warning: Action 11 (clvmd_reload_0) on node1 failed (target: 0 vs. rc: 3): Error
Jul 22 12:05:05 node1 crmd[19034]: notice: Transition 3278 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=2, Source=/var/lib/pacemaker/pengine/pe-input-628.bz2): Complete
Jul 22 12:05:05 node1 pengine[19033]: warning: Processing failed op start for clvmd:0 on node2: unimplemented feature (3)
Jul 22 12:05:05 node1 pengine[19033]: error: Preventing clvmd-clone from re-starting on node2: operation start failed 'unimplemented feature' (3)
Jul 22 12:05:05 node1 pengine[19033]: warning: Processing failed op start for clvmd:0 on node2: unimplemented feature (3)
Jul 22 12:05:05 node1 pengine[19033]: error: Preventing clvmd-clone from re-starting on node2: operation start failed 'unimplemented feature' (3)
Jul 22 12:05:05 node1 pengine[19033]: warning: Processing failed op start for clvmd:1 on node1: unimplemented feature (3)
Jul 22 12:05:05 node1 pengine[19033]: error: Preventing clvmd-clone from re-starting on node1: operation start failed 'unimplemented feature' (3)
Jul 22 12:05:05 node1 pengine[19033]: warning: Processing failed op start for clvmd:1 on node1: unimplemented feature (3)
Jul 22 12:05:05 node1 pengine[19033]: error: Preventing clvmd-clone from re-starting on node1: operation start failed 'unimplemented feature' (3)
Jul 22 12:05:05 node1 pengine[19033]: warning: Forcing clvmd-clone away from node1 after 1000000 failures (max=1000000)
Jul 22 12:05:05 node1 pengine[19033]: warning: Forcing clvmd-clone away from node1 after 1000000 failures (max=1000000)
Jul 22 12:05:05 node1 pengine[19033]: warning: Forcing clvmd-clone away from node2 after 1000000 failures (max=1000000)
Jul 22 12:05:05 node1 pengine[19033]: warning: Forcing clvmd-clone away from node2 after 1000000 failures (max=1000000)
Jul 22 12:05:05 node1 pengine[19033]: notice: Stop clvmd:0#011(node2)
Jul 22 12:05:05 node1 pengine[19033]: notice: Stop clvmd:1#011(node1)
# pcs status
[...]
Clone Set: clvmd-clone [clvmd]
Stopped: [ node1 node2 ]
Failed Actions:
* clvmd_start_0 on node2 'unimplemented feature' (3): call=72, status=complete, exitreason='none',
last-rc-change='Fri Jul 22 12:05:06 2016', queued=0ms, exec=11ms
* clvmd_start_0 on node1 'unimplemented feature' (3): call=72, status=complete, exitreason='none',
last-rc-change='Fri Jul 22 12:05:05 2016', queued=0ms, exec=37ms
Resolution
- If this issue has occurred and the resource is stopped as a result, simply clean-up the resource to get it started again
# pcs resource cleanup <resource-name>
-
Workaround: To prevent this issue from occurring, do not update the configuration attributes for a
clvmresource while it is in use and is depended on by other resources -
Workaround: To activate
cmirrord, plan to take an outage in any resources dependent onclvm, then update the configuration to includewith_cmirrord="true", then clean up the resource as described above. That is:
# pcs resource update clvmd with_cmirrord="true"
# pcs resource cleanup clvmd
NOTE: This will restart clvmd and any dependent resources, so be prepared for an outage.
- Workaround: To activate
cmirrordwithout causing an outage to any resources, consider simply starting/usr/sbin/cmirrordmanually from the command line- NOTE: Red Hat does not test such a configuration where
cmirrordis managed outside of theclvmresource, and cannot state with certainty that issues will not arise in any circumstance. For example, if theclvmresource restarts or moves for unrelated reasons whilecmirrordis running outside the resource, then unexpected behavior may occur. Other scenarios or conditions may result in additional problematic aspects. Use this workaround at your own risk.
- NOTE: Red Hat does not test such a configuration where
Root Cause
This issue has been resolved via errata RHBA-2017:1844 - Bug Fix Advisory
The ocf:heartbeat:clvm resource contained a defect in which it advertised support for a reload action that it does not actually implement. When a resource-agent advertises such a reload action, pacemaker will utilize such an action if the configuration of any such resource is updated. That is, since clvm advertises support for reload, any change to a clvm resource's attributes will cause pacemaker to try to reload it. However, since that agent doesn't actually support that action, this reload attempt produces an "unimplemented feature" error that will cause the resource to fail on all nodes.
As such, there is currently no way to start cmirrord through the clvm agent without creating an outage for that resource and its dependents. An alternative option may be simply starting cmirrord manually, but as noted above, Red Hat cannot state with certainty there won't be issues with such a configuration. As such, it may be best to simply plan to take an outage when the clvm resource needs to be updated and cmirrord needs to be activated. Once it is activated, it may be wise to leave it as such, so that in future attempts to pvmove, cmirrord should already be running. If there are no mirrored clustered logical volumes or pvmoves in place, then having cmirrord running is expected to produce minimal/negligible resource utilization on the nodes, so there should be little downside to having it activated even when its not needed.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.