Can two-node RHEL 7, RHEL 8 or RHEL 9 High Availability clusters be susceptible to fence loops?

Solution Verified - Updated 5 Aug 2024

Environment

Red Hat Enterprise Linux 7 (with the High Availability Add-on)
Red Hat Enterprise Linux 8 (with the High Availability Add-on)
Red Hat Enterprise Linux 9 (with the High Availability Add-on)
corosync with votequorum
Two-node cluster

Issue

I have a two-node cluster using pacemaker. Do I need to worry about fence loops like I did with older cman clusters?
Does corosync with votequorum have a problem with fence loops where the network goes down, a node gets fenced, and then it fences the other node when it boots back up?
How can I avoid fence loops with pacemaker in RHEL 7 or RHEL 8 or RHEL 9?

Resolution

If there are only 2 nodes in the cluster, then ensure that the two_node and wait_for_all options are enabled in the quorum section of /etc/corosync/corosync.conf.

Note: If two_node is enabled and wait_for_all is not specified, then wait_for_all is enabled implicitly. If wait_for_all is specified, then it takes whatever value is set.

# # wait_for_all is enabled implicitly because two_node is enabled
quorum {
    provider: corosync_votequorum
    two_node: 1
}

# # wait_for_all is enabled explicitly
quorum {
    provider: corosync_votequorum
    two_node: 1
    wait_for_all: 1
}

# # wait_for_all is disabled because two_node is not enabled
# # wait_for_all is not set explicitly so it defaults to 0 here
quorum {
    provider: corosync_votequorum
}

# # wait_for_all is disabled explicitly
quorum {
    provider: corosync_votequorum
    two_node: 1
    wait_for_all: 0
}

After changing either of these options, sync the updated configuration to the other node.

# pcs cluster sync

Then restart the cluster on both nodes. This command only has to be ran from one cluster node and will stop the cluster stack on all cluster nodes, then start the cluster stack on all cluster nodes.

# pcs cluster stop --all
# pcs cluster start --all

Root Cause

With the following options in /etc/corosync/corosync.conf for a 2-node cluster, a pacemaker cluster using corosync and votequorum is not susceptible to fence loops.

quorum {
    provider: corosync_votequorum
    two_node: 1
}

When the two_node option is enabled, the option wait_for_all is implicitly enabled as well, which prevents fence loops from occurring.

The wait_for_all option causes any node that is just starting corosync to wait until all the nodes have been seen alive at least once at the same time before gaining quorum and proceeding to participate in the cluster.

Note: A 2-node cluster created by the pcs cluster setup command gets two_node: 1 added to its corosync.conf file automatically. So it's effectively the default setting after creating the cluster in this way. However, if the two_node: 1 line is removed or commented out, then the two_node option's value defaults to 0.

Diagnostic Steps

If there are only two nodes in the cluster, verify that the two_node option is enabled in the /etc/corosync/corosync.conf file. The option is enabled when the value of two_node is 1.
```
 # grep two_node /etc/corosync/corosync.conf
     two_node: 1
```
Check whether the wait_for_all option is enabled in /etc/corosync/corosync.conf. This option is disabled if a line reads wait_for_all: 0. Otherwise, wait_for_all is enabled as long as two_node is enabled.
```
 # grep wait_for_all /etc/corosync/corosync.conf
 #
```

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cluster

Category

Configure

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.