How can I avoid fencing loops with 2 node clusters and Red Hat High Availability clusters?

Solution Verified - Updated 4 Mar 2024

Environment

Red Hat Enterprise Linux (RHEL) 5 and later with the High Availability Add-On
Red Hat High Availability Cluster with 2 nodes (this issue does not apply to 3 or more node clusters).
cman-based clusters: No quorum disk is configured. Cluster must have only 2 votes with two_node="1" set.
corosync+votequorum-based clusters: wait_for_all is not set (it is set by default if two_node is enabled).
Cluster fencing devices are IP-based and are accessed over the network.
- Fencing devices are reached over a different network to the network that the cluster communicates over. This effectively means both nodes can reach the fencing devices when the cluster interconnect is unavailable.

Issue

Why should I have my fencing devices on the same network as my cluster communication with Red Hat High Availability clusters?
How can I prevent fence loops with my cluster?
If one node in a 2-node cluster is up and running, and the other node boots up while there is a network issue, that booting node fences the active node
In a 2-node cluster configured with fence_scsi, if a node gets fenced due to a network issue and I then reboot it, the "active" node reports SCSI reservation conflicts and path failures as the rebooted node starts its services.
Server hangs during boot time with error message "Joining fence domain".

Resolution

There are several possible ways to prevent fencing loops. Any one of the following options by itself is sufficient to address this problem:

1. cman-based clusters only: Incorporate a quorum disk into our cluster that will act as a tie-breaker if the cluster interconnect network is unavailable.

More information about quorum disks is available here: How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux Clustering and High-Availability Environments
One of the following must be configured
- master_wins mode, OR
- A heuristic that only succeeds if the node has access to the cluster interconnect network. A ping heuristic of the gateway for that network, or another host on that network, is common.

Notes:

Red Hat recommends using quorum disks only when necessary, and to avoid using them whenever possible as they create extra complexity in the cluster, which in turn provides more points of failure.

2. corosync+votequorum-based clusters only: Ensure that wait_for_all is enabled.

wait_for_all prevents fence loops in two-node clusters, and is enabled by default if two_node mode is enabled.

3. cman-based clusters only: Set FENCED_MEMBER_DELAY=-1 (RHEL 6 only) or to a very high value (RHEL 5 or 6) in /etc/sysconfig/cman (file may not exist by default, in which case it should be created), on only one node (the node you want to "lose" in a fencing race). If you have a fencing delay on a fencing device for one node, add FENCED_MEMBER_DELAY=-1 to the OTHER node.

# grep FENCED_MEMBER_DELAY /etc/sysconfig/cman
FENCED_MEMBER_DELAY=-1

Notes:

This option has the downside that a node will not be able to form a cluster or start its services if the other node has not already started, or is not in the process of starting, its cluster services.
The node with FENCED_MEMBER_DELAY=-1 will wait indefinitely for the other node to show up before starting services and cannot form a new quorate cluster on its own (but it can take over a service on its own if the other node becomes unavailable). In such scenarios the node with FENCED_MEMBER_DELAY=-1 may throw error message "Joining fence domain" and wait until the other node comes up.
A FENCED_MEMBER_DELAY value of -1 will only have the effect of waiting indefinitely in RHEL 6. RHEL 5 would treat this value as an indication to not wait at all, so instead this should be set to a very high value to cause a node to wait that long for another node to join.

4. cman-based clusters only: Set the cluster daemons to not start at boot time on the "losing" node (the one without the delay on its fence device). This would mean that after "losing" the fence race and being fenced, the rebooted node would come back up but would not form a cluster and would therefore be of no danger to the running node.

Chkconfig off all the This content is not included.cluster services for your cluster so they do not start on boot:

# chkconfig cman off
# chkconfig clvmd off
# chkconfig gfs2 off
# chkconfig rgmanager off

Notes:

This option has the downside that a node will not automatically rejoin the cluster after fencing or a reboot, even if the issue that caused fencing in the first place has resolved itself by then.
- An administrator must manually intervene to start the services, causing the node to join the cluster.

5. cman-based clusters only: Set the fence action to "off" for a fence agent on only one node. If you have a fencing delay on a fencing device for one node, add action="off" to the OTHER node.

# grep fencedevices /etc/cluster/cluster.conf
        <fencedevices>
                <fencedevice agent="fence_ipmilan" ipaddr="10.1.1.20" lanplus="1" login="root" name="node1ilo" passwd="***" delay="30"/>
                <fencedevice agent="fence_ipmilan" ipaddr="10.1.1.21" lanplus="1" login="root" name="node2ilo" passwd="***" action="off"/>
    </fencedevices>

In that case instead of rebooting the node the fence action would power the node off.

Notes:

This option has the downside that a node will not automatically rejoin the cluster after fencing, even if the issue that caused fencing in the first place has resolved itself by then.
An administrator must manually intervene to boot the system, at which point the node will rejoin the cluster if the services are chkconfig'd on.

NOTE: The following option may reduce redundancy in the cluster, which may make them less ideal. Please review the caveats of such an approach before choosing to use one of these methods.

6. Reconfigure the network layout so the fence devices are accessed via the cluster interconnect network, or create a static route so that fence device traffic is routed through the cluster interconnect network rather than the default gateway. If the fence devices are accessed via the cluster interconnect network by default, then you should be protected from fence loops (except in exceptional circumstances, such as multicast failure but TCP/IP unicast is still available).

If the fence devices are accessed via a network other than the cluster interconnect network (eg. via the default gateway), but they could route via the cluster interconnect network, setting a This content is not included.static route can cause traffic to pass via the cluster interconnect network instead.

- If your cluster interconnect uses bond1 and your two fencing devices are 10.1.2.20 and 10.1.2.21, you could configure two static routes like this:  

```
# cat /etc/sysconfig/network-scripts/route-bond1

GATEWAY0=10.1.2.254
NETMASK0=255.255.255.255
ADDRESS0=10.1.2.20

GATEWAY1=10.1.1.254
NETMASK1=255.255.255.255
ADDRESS1=10.1.2.21
```

- Then restart networking to pick up the changes: 

```
# service network restart
```

[Note] If the fence devices are not physically routable from the private network, you may have to reconfigure the network or change the IP address of the fencing devices to move them to the cluster interconnect network to use this method.

Root Cause

A fencing loop can occur on a 2-node cluster when the cluster interconnect experiences issues that prevent the nodes from communicating, and one of the nodes starts the cman service (RHEL 5 and 6) or the pacemaker.service systemd unit (RHEL 7 and later).
- When the network is lost, both cluster nodes will notice the other is missing and try to fence. If both can reach each fencing devices via the public network and one will win and fence the other node off.
- When the fenced node reboots, it will wait for the existing node to rejoin its cluster. After the fenced node waits a period of time, it will decide that the existing node is in an unknown state (because the network is still down) and try to fence it, which will succeed.
- The original node then reboots and fences the other node, and this continues until manual intervention occurs.

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cman

Category

Supportability

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.