Can two-node RHEL 7, RHEL 8 or RHEL 9 High Availability clusters be susceptible to fence loops?
Environment
- Red Hat Enterprise Linux 7 (with the High Availability Add-on)
- Red Hat Enterprise Linux 8 (with the High Availability Add-on)
- Red Hat Enterprise Linux 9 (with the High Availability Add-on)
corosyncwithvotequorum- Two-node cluster
Issue
- I have a two-node cluster using
pacemaker. Do I need to worry about fence loops like I did with oldercmanclusters? - Does
corosyncwithvotequorumhave a problem with fence loops where the network goes down, a node gets fenced, and then it fences the other node when it boots back up? - How can I avoid fence loops with
pacemakerin RHEL 7 or RHEL 8 or RHEL 9?
Resolution
If there are only 2 nodes in the cluster, then ensure that the two_node and wait_for_all options are enabled in the quorum section of /etc/corosync/corosync.conf.
Note: If two_node is enabled and wait_for_all is not specified, then wait_for_all is enabled implicitly. If wait_for_all is specified, then it takes whatever value is set.
# # wait_for_all is enabled implicitly because two_node is enabled
quorum {
provider: corosync_votequorum
two_node: 1
}
# # wait_for_all is enabled explicitly
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
}
# # wait_for_all is disabled because two_node is not enabled
# # wait_for_all is not set explicitly so it defaults to 0 here
quorum {
provider: corosync_votequorum
}
# # wait_for_all is disabled explicitly
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 0
}
After changing either of these options, sync the updated configuration to the other node.
# pcs cluster sync
Then restart the cluster on both nodes. This command only has to be ran from one cluster node and will stop the cluster stack on all cluster nodes, then start the cluster stack on all cluster nodes.
# pcs cluster stop --all
# pcs cluster start --all
Further reading
- Chapter 10. Cluster Quorum | RHEL 7
- 10.3. Modifying Quorum Options (Red Hat Enterprise Linux 7.3 and later) | RHEL 7
- Chapter 26. Configuring cluster quorum | RHEL 8
- 26.2. Modifying quorum options | RHEL 8
- Chapter 26. Configuring cluster quorum | RHEL 9
- 26.2. Modifying quorum options | RHEL 9
In some cases the runtime values in the output of corosync-cmapctl might reflect different than what is in /etc/corosync/corosync.conf and the different values should not affect the operation of the cluster. Our recommendation is to restart the cluster stack at earliest convenience so that the output in corosync-cmapctl and /etc/corosync/corosync.conf are the same. Changes of quorum.wait_for_all do not take effect at runtime and quorum.wait_for_all can only be set when corosync is started, which is reason we have to stop and start the cluster stack after making changes.
Root Cause
With the following options in /etc/corosync/corosync.conf for a 2-node cluster, a pacemaker cluster using corosync and votequorum is not susceptible to fence loops.
quorum {
provider: corosync_votequorum
two_node: 1
}
When the two_node option is enabled, the option wait_for_all is implicitly enabled as well, which prevents fence loops from occurring.
The wait_for_all option causes any node that is just starting corosync to wait until all the nodes have been seen alive at least once at the same time before gaining quorum and proceeding to participate in the cluster.
Note: A 2-node cluster created by the pcs cluster setup command gets two_node: 1 added to its corosync.conf file automatically. So it's effectively the default setting after creating the cluster in this way. However, if the two_node: 1 line is removed or commented out, then the two_node option's value defaults to 0.
Diagnostic Steps
-
If there are only two nodes in the cluster, verify that the
two_nodeoption is enabled in the/etc/corosync/corosync.conffile. The option is enabled when the value oftwo_nodeis 1.# grep two_node /etc/corosync/corosync.conf two_node: 1 -
Check whether the
wait_for_alloption is enabled in/etc/corosync/corosync.conf. This option is disabled if a line readswait_for_all: 0. Otherwise,wait_for_allis enabled as long astwo_nodeis enabled.# grep wait_for_all /etc/corosync/corosync.conf #
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.