Should I specify an action when creating stonith devices in my High Availability cluster with pacemaker?
Environment
- Red Hat Enterprise Linux (RHEL) 6, 7, 8, 9 or 10 with the High Availability Add On
pacemaker- One or more stonith devices configured with an
action="reboot"oraction="off"attribute
Issue
- When using
cman, we used to create a<device/>entry in/etc/cluster/cluster.confand specify the desired action, likeaction="off". Should I specify an action when creating astonithdevice withpcs? - Will
stonith-ngoverride my configuredactionsetting on astonithdevice with the expected action that needs to be carried out? - Nodes are being rebooted when joining the cluster and unfencing is carried out
- Nodes reboot when stonith is executing a monitor operation
- How do I change the fence agent so it shuts a node down instead of rebooting the node?
- How do I see what the fence action is set to for a fence agent?
- Nodes get rebooted unexpectedly when I issue a
pcs resource cleanuporpcs stonith cleanup, and there don't appear to be any resource failures to cause fencing. - The below warning is observed at the top of the
pcs statusoutput:
WARNINGS:
Following stonith devices have the 'action' option set, it is recommended to set 'pcmk_off_action', 'pcmk_reboot_action' instead: node1-fence
Resolution
-
If the default action to take in a given situation needs to be overridden to use a different action, then the desired action should be set via
pcmk_reboot_action,pcmk_off_action,pcmk_list_action,pcmk_monitor_action, orpcmk_status_action.- For example, if
stonith-ngshould use the "off" action in situations where it would normally use "reboot", the device can be created withpcmk_reboot_action="off":
# pcs stonith create scsi fence_scsi pcmk_reboot_action="off"- NOTE: The above example is typical for `fence_scsi` deployments, as they would require `fence_scsi` to simply remove the keys (turn "off" a node), rather than remove and re-add them (turn "off" and "on"). - For example, if
-
For setups with redundant power supplies or other fence device configurations that require multiple actions be carried out in a specific order, create a separate device for each device/action combination, and specify
pcmk_reboot_action="<action>"appropriately for each one. -
To see what the current fence action is set to:
# pcs stonith show <name_of_fence_agent> -
For example:
# pcs stonith show fence_agent_xvm Resource: fence_agent_xvm (class=stonith type=fence_xvm) Attributes: key_file=/etc/cluster/fence_xvm.key pcmk_reboot_action=off Operations: monitor interval=60s (fence_laptop-monitor-interval-60s) -
To revert back to default action of
rebootand unset theactionparameter, the following command can be used (ensure that there is no white space or any value after equal=sign):# pcs stonith update fence_agent_xvm action= -
Note: If the action hasn't been changed from the default of
rebootthen you will not seepcmk_reboot_action=at the end of the line starting withAttributes:.
Root Cause
pacemaker and stonith-ng allow for specifying an action value directly in a stonith device's attributes when creating or editing that device, but in most situations this should not be done, and instead stonith-ng's pcmk_off_action, pcmk_list_action, pcmk_monitor_action, or pcmk_status_action setting should be used to override the action used in certain circumstances.
Red Hat is currently considering changes to pacemaker so that it does not allow a configured action attribute on a stonith device to lead to unexpected node reboots or power-offs, via Bugzilla #1421700.
Red Hat is also considering changes to pcs to prevent it from allowing action attributes on a stonith device, via Bugzilla #1421702.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.