Cluster heuristic using ping is not preventing fence race in RHEL 5
Environment
- Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
- Configuration utilizes a quorum device and ping heuristic
Issue
- We implemented two heuristic checks in our cluster but it seems that there's always a fence race
Resolution
The cluster is configured with this quorum disk settings:
<quorumd interval="3" label="qdisk" min_score="1" tko="10" votes="2">
<heuristic interval="5" program="ip link show dev bond0" score="1"/>
<heuristic interval="5" program="ip link show dev bond2" score="1"/>
</quorumd>
These heuristic checks monitor the link status of the bond devices link.
Unfortunately these checks are not valid since the ip link check will always exit with a status of 0:
# ip link show dev virbr0
6: virbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN link/ether 52:54:00:a9:52:42 brd ff:ff:ff:ff:ff:ff
# echo $?
0
This cause the heuristic check to never fail.
In fact, the heuristic check can fails in two occasions:
-
The specified program returns a non-zero value in tko consecutive attempts, or
-
Here's an example of the ip link check exist status:
In this case I would recommend to change the heuristic with something like a ping, for example pinging the switches where the bonding links are connected.
More info about quorum disk on this KB
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.