With votequorum's last_man_standing enabled, the last node still loses quorum when the second-to-last node becomes unresponsive in a RHEL 7 High Availability cluster
Environment
- Red Hat Enterprise Linux 7 (with the High Availability Add-on)
- Red Hat Enterprise Linux 8 (with the High Availability Add-on)
corosyncwith thevotequorumservice
Issue
- I have
last_man_standingconfigured in this cluster, nodes are removed one-by-one in a cascading fashion, and the number of votes required for quorum is decreased after each event. However when the cluster membership is down to the last two nodes and one of them leaves, the remaining node loses quorum. last_man_standingdoesn't work when the last node leaves.- Is
last_man_standingsupposed to work all the way down to a single node?
Resolution
Using
auto_tie_breaker
Enable auto_tie_breaker in the votequorum configuration within /etc/corosync/corosync.conf. For example, in an 8-node cluster:
quorum {
provider: corosync_votequorum
expected_votes: 8
last_man_standing: 1
auto_tie_breaker: 1
}
Using
corosync-qdevice
Configure corosync-qdevice using the lms (last-man-standing) algorithm. Note that the last_man_standing option in the quorum section of /etc/corosync/corosync.conf must be removed when configuring corosync-qdevice, as the two are mutually incompatible. For complete details, refer to the following documentation.
- Configuring and managing high availability clusters | Chapter 26. Configuring quorum devices (RHEL 8)
- High Availability Add-On Reference | 10.5. Quorum Devices (RHEL 7)
- Deployment Examples for RHEL High Availability Clusters - Enabling QDevice Quorum Arbitration in RHEL 7
Root Cause
When the last_man_standing option is enabled, it will allow corosync's votequorum to reduce the number of votes required for quorum each time a segment of the cluster membership is removed and the last_man_standing_window timer has passed. However, if this happens in a cascading fashion until the cluster finally ends up with only two nodes and then one of those last two remaining nodes leaves, quorum will be lost. If you want one node to maintain quorum by itself with last_man_standing enabled, you must also enable auto_tie_breaker. The auto_tie_breaker option configures the cluster so that in the event of an even split, one half of the nodes (in this case, one node) maintains quorum.
From the votequorum(5) man page:
NOTES: In order for the cluster to downgrade automatically from 2 nodes to a 1
node cluster, the auto_tie_breaker feature must also be enabled (see below). If
auto_tie_breaker is not enabled, and one more failure occurs, the remaining node
will not be quorate. LMS does not work with asymmetric voting schemes, each node
must vote 1. LMS is also incompatible with quorum devices, if last_man_standing
is specified in corosync.conf then the quorum device will be disabled.
See the votequorum(5) man page for more details on last_man_standing, auto_tie_breaker, and other votequorum features.
The auto_tie_breaker option takes effect only when there are the same number of nodes in each of the cluster partitions. For example, if you have a 3-node cluster and crash 2 nodes at the same time (before last_man_standing_window has passed and votes are recalculated), then the cluster will lose quorum, as there will be unequal number of nodes in each partition (partition_1: 1 node, partition_2: 2 nodes). In this case, auto_tie_breaker cannot help. The cluster will lose quorum and will not be able to fence the cluster nodes or recalculate votes.
Note that auto_tie_breaker is deterministic. In the event of a split, it automatically selects one cluster partition (by default, the partition containing the node with the lowest nodeid). For example, if auto_tie_breaker is enabled, nodes 1 and 2 are both online members of the cluster, and then there is a network split, node 1 maintains quorum and node 2 loses quorum. This is because corosync on node 2 sees that it can no longer communicate with node 1, but it doesn't know whether node 1 is still online or whether it has crashed.
This can be a problem if node 1 has crashed and node 2 is the only remaining online node. As discussed above, from node 2's perspective, a situation node 1 has crashed looks identical to a situation where both nodes remain online but the network communication is broken. So to be safe and prevent a split-brain scenario from occurring (which could cause corruption), node 2 must relinquish quorum, in case node 1 is online and claiming quorum.
What this means for last_man_standing is that even with auto_tie_breaker enabled, the last remaining cluster node may not be able to maintain quorum.
For example, consider a 3-node cluster. Node 1 fails. The last_man_standing_window timer expires, and the votes needed for quorum are recalculated. Then node 2 fails. Now node 3 is the last remaining node online. But node 3 cannot have quorum, because from node 3's perspective, node 2 might still be online; node 3 cannot know.
In this example, if node 3 had failed instead and node 2 had remained online, then node 2 would have maintained quorum.
The solution is to configure corosync-qdevice using the lms algorithm, rather than enabling the last_man_standing votequorum option. corosync-qdevice makes use of a third system called a corosync-qnetd server to arbitrate quorum. This enables more intelligent decision-making, so that one node can remain quorate after all the other nodes fail.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.