How do I delay fencing to prevent fence races when using a shared stonith device in a two-node cluster?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Server 6, 7, 8 or 9 (with the High Availability Add-on)
  • pacemaker-1.1.12-22.el7_1.4 or later

Issue

  • How do I delay fencing to prevent fence races when using a shared stonith device in a two-node cluster?
  • Can I set a delay for just one node with fence_vmware_soap?
  • The delay attribute doesn't work as expected when multiple nodes use the same stonith device.

Resolution

Set the Content from clusterlabs.org is not included.pcmk_delay_max attribute on the stonith device to add a random delay to all its off and reboot actions.

# pcs stonith update vmfence pcmk_delay_max=15

If pcs does not recognize pcmk_delay_max as a valid attribute, append the --force option.

# pcs stonith update vmfence pcmk_delay_max=15 --force

NOTE: If your pacemaker version is older than pacemaker-1.1.12-22.el7_1.4 and you cannot upgrade, use one stonith device for each node instead of a shared stonith device, and set the delay attribute. Please note that fencing delay is intended for 2 node clusters and if the same fence device is defined multiple times then you might run into the following issue: pcs shows several Failed actions for my fence_vmware_soap devices with unknown error' (1) and status=Timed Out in RHEL High Availability or Resilient Storage Cluster with Pacemaker

Be aware that random delay associated to value of pcmk_delay_max is calculated by each node independently so there is still minimal chance that the delay will be same for each node. This caveat is solved by utilizing pcmk_delay_base parameter which allows setting base delay for particular node.

NOTE: A shared stonith device can set a delay for specific cluster nodes using the same shared stonith device with pacemaker-2.1.2-4.el8 or later with the attribute pcmk_delay_base.

Root Cause

In a two-node cluster, both nodes may attempt to fence each other simultaneously, causing both nodes to reboot or power off. This happens most commonly when there is an issue with the heartbeat network, where both nodes are healthy but cannot communicate with each other.

In many cases, the delay attribute can prevent these "fence races." When there is a 1:1 ratio between nodes and stonith devices (i.e., each node has its own stonith device), a delay can be applied to the stonith device for the node that should win the race and stay online. This works well for fence agents like fence_ipmilan that utilize a physical server's management interface.

This approach does not work when one stonith device manages multiple nodes. Common examples of fence agents that can do this are:

  • fence_vmware_soap / fence_vmware_rest
  • fence_cisco_ucs
  • fence_rhevm
  • fence_xvm
  • fence_scsi / fence_mpath
  • fence_sbd

Applying the delay attribute to the shared stonith device would add an identical delay to every off or reboot action. If both nodes of a two-node cluster request fencing simultaneously and each one is delayed by 10 seconds, the end result is the same as not setting a delay at all.

The pcmk_delay_max attribute addresses this problem. From the stonithd(7) man page:

   pcmk_delay_max = time [0s]
       Enable a random delay for stonith actions and specify the maximum of random delay.

       This prevents double fencing when using slow devices such as sbd. Use this to enable a random delay for stonith actions. The overall delay is derived from this random delay value adding a static
       delay so that the sum is kept below the maximum delay.

The pcmk_delay_max attribute, added in pacemaker-1.1.12-22.el7_1.4, adds a random delay between 0 seconds and <pcmk_delay_max> to every off or reboot action. This way, if two nodes request fencing simultaneously, each one will be assigned a different, random delay. Under normal circumstances, one node will win the race and remain online.

Optionally, the pcmk_delay_base attribute can also be set. This adds a static delay in addition to the pcmk_delay_max random value. From the stonithd(7) man page:

   pcmk_delay_base = time [0s]
       Enable a base delay for stonith actions and specify base delay value.

       This prevents double fencing when different delays are configured on the nodes. Use this to enable a static delay for stonith actions. The overall delay is derived from a random delay value
       adding this static delay so that the sum is kept below the maximum delay.

If pcmk_delay_base is set to 5s and pcmk_delay_max is set to 15s, the delay for each off or reboot action will be between 5 seconds and 20 seconds.

Alternatively, on a cluster with one stonith device per node, pcmk_delay_base can be used in the same way as the delay attribute.

A 2 node cluster is a special case because the cluster only needs 1 cluster node to have quorum and quorum is required to fence a cluster node. If there are 3 or more cluster nodes, then a delay of the fencing device is not needed because in order to fence a cluster node the cluster node that is fencing must have quorum which prevents multiple cluster nodes being fenced at the same time.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.