What are my options for avoiding fence races in High Availability clusters with an even number of nodes?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 5 and later with the High Availability Add-On
  • A cluster with an even number of nodes
    • Two-node clusters are most commonly affected
    • Larger clusters with an even number of nodes can also be affected if using a quorum device or some other mechanism that would allow two halves of a cluster to stay quorate independently
  • Network-based fence devices that are accessed over a different network interface from that which is used for cluster communication

Issue

  • How can fence races be avoided in 2-node RHEL clusters?
  • When a network split occurs in a 2-node cluster, both nodes race to fence each other and the winner is not deterministic
  • Do I need a quorum device in a 2-node cluster to avoid fence races?
  • When I disconnect the heartbeat interface, the cluster goes completely down. The node with no cluster resources has been fenced and the node with all resources has halted itself..why?
  • With four nodes and a quorum device, I've seen that if there is a network split down the middle that creates 2 two-node partitions, both sides can race to fence each other.

Resolution

There are multiple ways to prevent fence races in affected clusters.

NOTE: Addressing the problem of fence races via the methods below may not address the risk of fence loops.

All cluster types
  • Use a shared fence device that only allows one log in at a time (such as an APC or WTI PDU).
  • Use the This content is not included.fence delay attribute to allow one node to always win races (RHEL 5 Update 6 or later, RHEL 6 Update 1 or later)
  • Move the fence devices to the cluster communication network so everything is accessed via that single interface, while taking note of the potential negative impacts of doing so.
RHEL 5 and 6 cman-based clusters

Use a quorum device. This quorum device must be configured either with:
RHEL 7 and later corosync+votequorum-based clusters

Root Cause

2-node clusters are inherently susceptible to an issue known as a "fence race" when connectivity between the nodes is lost. Because each node has the ability to form quorum on its own (as opposed to larger cluster where multiple nodes must be in communication to maintain quorum), when a split occurs each one will attempt to fence the other. Without one of the above mentioned mechanisms in place, either node may be the winner simply depending on which happens to connect to the fence device first. In some cases they may simultaneously issue the power-off command, resulting in both nodes shutting down, with neither one able to power the other back on.

Larger clusters consisting of an even number of nodes (4, 6, 8, etc) can also be susceptible to fence races if there is some mechanism (like a quorum device) that allows for half the nodes to maintain quorum, and there can be some communication problem that could split the nodes down the middle into two halves that can still communicate amongst themselves but not with the other half. This is usually only the case in cman-based clusters with a quorum device where the configuration does not result in a reboot from heuristic failures if the interconnect network is having problems.

Some fence devices (notably APC and WTI power switches) only allow one login at a time. If one of these is used as a shared fence device for both nodes (and no secondary fence method is configured), it should be sufficient for avoiding races, because whichever node happens to log in first will win, blocking the other from fencing it.

A fence delay, as in the article linked above, will also work, as it will ensure one node will be slower to reach the fence device and thus should lose races.

A quorum disk can be configured in cman-based clusters to use heuristics or master-wins logic to determine which node should be the winner in these situations.

With corosync and pacemaker:

  • auto_tie_breaker solves the problem by only allowing one half of the split to maintain quorum.
  • Booting without cluster stack is ensured by presence of nocluster at the 'linux16' or the 'linux' line of GRUB. This functionality is caused by the option ConditionKernelCommandLine in corosync systemd service file (/lib/systemd/system/corosync.service) set to the value of !nocluster. This is specific to RHEL 7 and later versions.
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.