How can I configure my High Availability Cluster with pacemaker and corosync to maintain quorum after half of the nodes fail?
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add On
- Red Hat Enterprise Linux (RHEL) 8 with the High Availability Add On
- Red Hat Enterprise Linux (RHEL) 9 with the High Availability Add On
pacemakercorosyncconfigured to usevotequorumfor quorum in/etc/corosync/corosync.conf- Cluster made up of an even number of nodes
Issue
- I have a four node cluster split across two racks, and if one rack completely fails then the other nodes are left without quorum. How can I solve this?
- Is there a way to have half the nodes in a
pacemakercluster survive a failure of the other half? - Is there a way to allow one node in a 3 or more node cluster to operate without the other two?
Resolution
What is "auto_tie_breaker" ?
The general behavior of votequorum requires at least 50% +1 votes in a cluster to maintain quorum, and allows a simultaneous node failure up to 50% of nodes before marking nodes outside of quorum (assuming each node has 1 vote ). For clusters containing an even number of nodes ( i.e. 2, 4, 6, etc. ), it is possible to run into a situations where, a network loss can evenly split the cluster into "partitions" of two equal sizes leaving neither with 50% + 1 votes to maintain quorum.
For example with the below swtich and node layout:
Attached to Switch A:
- Node 1
- Node 2
- Node 3
Attached to Switch B:
- Node 4
- Node 5
- Node 6
If switch A were to lose connection with switch B, then this would create 2 seperate partitions of equal sizes. By default and because both of these partitions require 50% membership + 1 vote to maintain quorum, these two partitions with 50% vote share exactly, would both be considered "inquorate" by default and when auto_tie_breaker ( ATB ) is disabled ( default setting ).
However, when auto_tie_breaker is enabled, the cluster can suffer up to 50% of the nodes failing at the same time, in a deterministic fashion. The attribute auto_tie_breaker has 3 possible values.
lowest: The set (or partition) of nodes that are still in contact with the node that has the lowest node id will remain quorate.highest: The set (or partition) of nodes that are still in contact with the node that has the lowest node id will remain quorate.<list of node IDs>: A space-separated list of one or more node ids. The nodes are evaluated in order of the list, so if the first node is present (before the partition split) in a set (or partition) then that node will be used to determine the quorate partition, if that node is not in either half (and was not in the cluster before the split) then the second node id will be checked for and so on.
When enabled and at standard settings, by default the cluster partition, or the set of nodes that are still in contact with the node that has the lowest nodeid will remain quorate. The other nodes will be inquorate. So for the above example, if switch A lost connection to switch B, creating an even 50 / 50 split, then the partition attached to switch A will maintain quorum because this partition contains the lowest node id of 1. The other partition attached to switch B will be considered "inquorate" in this scenario and thus will be subject to the cluster's configured no-quorum-policy ( stop all resources by default ).
This auto_tie_breaker option allows us to still maintain quorum in at least 1 partition in these even split-brain scenarios.
What if we want to maintain quorum in a partition that does not contain the lowest ID?
By default with ATB enabled, we will always choose the partition with the lowest ID ( i.e. Lowet node number in the cluster ) as the partition to maintain quorum in a 50/50 split. This behavior can be changed however, if for example you prefer the higher node ID to maintain the quorate partition, or if you want a specific node id ( if a certain node must remain up ). The auto_tie_breaker_node option can additionally be specified to adjust what node ID will be favored to maintain a partition with quorum, by specifying highest to say highest ID should remain up, or a node number id to specify a specific node to favor quorum:
# man votequorum
....
auto_tie_breaker_node: lowest|highest|<list of node IDs>
‘lowest’ is the default, ‘highest’ is similar in that if the current set of nodes contains the highest
nodeid then it will remain quorate. Alternatively it is possible to specify a particular node ID or
list of node IDs that will be required to maintain quorum. If a (space- separated) list is given,
the nodes are evaluated in order, so if the first node is present then it will be used to determine
the quorate partition, if that node is not in either half (ie was not in the cluster before the split)
then the second node ID will be checked for and so on.
Example configuration:
Pacemaker clusters of all sizes can use the auto_tie_breaker option to cause corosync to react to a failure of half the nodes by designating the partition which has contact with the lowest node ID to be quorate, and the other partition to be inquorate, automatically resolving even-split situations. This option can be enabled in /etc/corosync/corosync.conf:
quorum {
provider: corosync_votequorum
auto_tie_breaker: 1 # Enable ATB
auto_tie_breaker_node: highest # ( optional ) Specify highest node id keeps partition with quorum
} instead of lowest.
Or it can be set when creating the cluster with pcs cluster setup, for example:
# pcs cluster setup --name rhel7-cluster --auto_tie_breaker=1 node1.example.com node2.example.com node3.example.com node4.example.com
Possible Alternative Solutions ( Majority Maker Votes and qdevices ):
An alternative solution to using auto_tie_breaker is to use a quorum device, additional node for a "tie-breaker" vote, or other "majority maker" to establish a cluster with an odd number of votes. The majority maker / tie breaker vote avoids the possible split-brain / split-partition scenario, by introducing an additional vote to make the vote count total an odd number. This in turn makes split brains scenario more difficult, as it would require a loss of 1 node, and then and additional even network split.
Options available here for introducing a Majority Maker:
-
Add a new node to the cluster as a full corosync member:
-
Add a qdevice to the cluster. This is a special server that only provides an additional quorum vote ( in tie-breaker situations ), but is otherwise not a full node member. Below documentation goes over configuration of "qdevice" ( majority maker votes ), if this is preferred compared to maintaining a cluster with an even number of quorum votes:
-
Exploring RHEL High Availability's Components - corosync-qdevice and corosync-qnetd
-
Please note,
auto_tie_breakeris actually incompatible with quorum devices or other majority maker options. If auto_tie_breaker is specified in corosync.conf then the quorum device will be disabled, so it is required to choose one option or the other.
-
Additional Considerations:
-
In two-node clusters, quorum-preservation in a site-split can alternatively be handled by
two_nodemode in/etc/corosync/corosync.conf:quorum { provider: corosync_votequorum two_node: 1 }-
This allows both nodes to remain quorate, but will bypass the
auto_tie_breakerlowest-node-ID logic. Ifauto_tie_breakeris disabled andtwo_nodeenabled, this configuration should be combined with another mechanism to avoid fence races, such as adelaysetting on thestonithdevice. -
More on votequorum.
-
-
If you want one node in a 3+ node cluster to survive the loss of the other nodes, you can configure
last_man_ standingmode in/etc/corosync/corosync.confas shown below:quorum { provider: corosync_votequorum expected_votes: 3 last_man_standing: 1 auto_tie_breaker: 1 wait_for_all: 1 } -
The favored node is hardcoded and non-variable, and thus it will always choose the same partition to maintain quorum based on the
auto_tie_breaker_nodevalue (lowestby default ), even if this partition is not necessarily the best partition to keep in all scenarios. So for example, if first 3 nodes are attached to switch A, but switch A is having flapping issues which causes the split-brain, then quorum will still be maintained with the cluster partition attached to switch A, even though this partition is the one that may be hitting the issues that caused the split. Additional consideration should be given to ensure that he favored partition can stay healthy in possible split brain scenarios. -
There are some additional considerations, when configuring
auto_tie_breakerin an environment with sbd self-fencing stonith type and this is not recommended for two node clusters withsbdenabled:
Further reading:
- With votequorum's last_man_standing enabled, the last node still loses quorum when the second-to-last node becomes unresponsive in a RHEL 7 High Availability cluster
- Configuring and managing high availability clusters Red Hat Enterprise Linux 8
- Configuring and managing high availability clusters Red Hat Enterprise Linux 9
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.