Cluster heuristic using ping is not preventing fence race in RHEL 5

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • Configuration utilizes a quorum device and ping heuristic

Issue

  • We implemented two heuristic checks in our cluster but it seems that there's always a fence race

Resolution

The cluster is configured with this quorum disk settings:

        <quorumd interval="3" label="qdisk" min_score="1" tko="10" votes="2">
                <heuristic interval="5" program="ip link show dev bond0" score="1"/>
                <heuristic interval="5" program="ip link show dev bond2" score="1"/>
        </quorumd>

These heuristic checks monitor the link status of the bond devices link.
Unfortunately these checks are not valid since the ip link check will always exit with a status of 0:

# ip link show dev virbr0
6: virbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN  link/ether 52:54:00:a9:52:42 brd ff:ff:ff:ff:ff:ff
# echo $?
0

This cause the heuristic check to never fail.

In fact, the heuristic check can fails in two occasions:

  • The specified program returns a non-zero value in tko consecutive attempts, or

  • Here's an example of the ip link check exist status:

In this case I would recommend to change the heuristic with something like a ping, for example pinging the switches where the bonding links are connected.

More info about quorum disk on this KB

SBR
Components
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.