Should I specify an action when creating stonith devices in my High Availability cluster with pacemaker?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 6, 7, 8, 9 or 10 with the High Availability Add On
  • pacemaker
  • One or more stonith devices configured with an action="reboot" or action="off" attribute

Issue

  • When using cman, we used to create a <device/> entry in /etc/cluster/cluster.conf and specify the desired action, like action="off". Should I specify an action when creating a stonith device with pcs?
  • Will stonith-ng override my configured action setting on a stonith device with the expected action that needs to be carried out?
  • Nodes are being rebooted when joining the cluster and unfencing is carried out
  • Nodes reboot when stonith is executing a monitor operation
  • How do I change the fence agent so it shuts a node down instead of rebooting the node?
  • How do I see what the fence action is set to for a fence agent?
  • Nodes get rebooted unexpectedly when I issue a pcs resource cleanup or pcs stonith cleanup, and there don't appear to be any resource failures to cause fencing.
  • The below warning is observed at the top of the pcs status output:
WARNINGS:
Following stonith devices have the 'action' option set, it is recommended to set 'pcmk_off_action', 'pcmk_reboot_action' instead: node1-fence

Resolution

  • If the default action to take in a given situation needs to be overridden to use a different action, then the desired action should be set via pcmk_reboot_action, pcmk_off_action, pcmk_list_action, pcmk_monitor_action, or pcmk_status_action.

    • For example, if stonith-ng should use the "off" action in situations where it would normally use "reboot", the device can be created with pcmk_reboot_action="off":
    # pcs stonith create scsi fence_scsi pcmk_reboot_action="off"
    
     - NOTE: The above example is typical for `fence_scsi` deployments, as they would require `fence_scsi` to simply remove the keys (turn "off" a node), rather than remove and re-add them (turn "off" and "on").
    
  • For setups with redundant power supplies or other fence device configurations that require multiple actions be carried out in a specific order, create a separate device for each device/action combination, and specify pcmk_reboot_action="<action>" appropriately for each one.

  • To see what the current fence action is set to:

    # pcs stonith show <name_of_fence_agent>
    
  • For example:

    # pcs stonith show fence_agent_xvm
     Resource: fence_agent_xvm (class=stonith type=fence_xvm)
       Attributes: key_file=/etc/cluster/fence_xvm.key pcmk_reboot_action=off
       Operations: monitor interval=60s (fence_laptop-monitor-interval-60s)
    
  • To revert back to default action of reboot and unset the action parameter, the following command can be used (ensure that there is no white space or any value after equal = sign):

    # pcs stonith update fence_agent_xvm action=
    
  • Note: If the action hasn't been changed from the default of reboot then you will not see pcmk_reboot_action= at the end of the line starting with Attributes:.

Root Cause

pacemaker and stonith-ng allow for specifying an action value directly in a stonith device's attributes when creating or editing that device, but in most situations this should not be done, and instead stonith-ng's pcmk_off_action, pcmk_list_action, pcmk_monitor_action, or pcmk_status_action setting should be used to override the action used in certain circumstances.

Red Hat is currently considering changes to pacemaker so that it does not allow a configured action attribute on a stonith device to lead to unexpected node reboots or power-offs, via Bugzilla #1421700.

Red Hat is also considering changes to pcs to prevent it from allowing action attributes on a stonith device, via Bugzilla #1421702.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.