What is master_wins mode with a quorum device in RHEL 5 or 6 High Availability clusters?
Environment
- Red Hat Enterprise Linux (RHEL) with the High Availability Add On
- Two node cluster
- Cluster configured with a quorum device to prevent fence races
Issue
- What is
master_winsmode in a cluster with a quorum deviceqdiskd? - Why do both nodes in a 2 node cluster fence themselves when the cluster communication network goes down?
Resolution
NOTE: A quorum device can add complexity to a cluster configuration and should only be used if required. More information about quorum devices, including use-cases for employing one, can be found in the following tech-brief: How to Optimally Configure a Quorum Disk in Red Hat Enterprise Linux Clustering and High-Availability Environments
Red Hat Enterprise Linux 6
Master-wins mode is automatically configured when:
- There are only 2 nodes in the cluster
- There is a quorum device configured that has no heuristics.
Master-wins mode may not be enabled outside of the above listed conditions. No further steps are needed to enable this mode, outside of enabling the above two conditions. However, master_wins="1" can be added to the <quorumd> tag in /etc/cluster/cluster.conf to make it more clear to administrators that this is the mode the quorum device will operate in:
<quorumd label="cluster1qdisk" master_wins="1"/>
Red Hat Enterprise Linux 5 Update 5 and later
`master_wins="1"` must be added manually to the `
- All nodes use
cman-2.0.115-34.el5or later - cluster has only 2 nodes. If there are 3 or more nodes, master_wins should not be used and a heuristic should be used instead.
To enable master wins mode:
- Find the
<quorumd>section, and ensure there are no heuristics configured as a child of<quorumd>. If a heuristic is configured, it should be removed.
<quorumd interval="5" label="cluster1qdisk" min_score="1" tko="3" votes="1"/>
- Add
master_wins="1"to the quorumd tag (and optionally remove the min_score attribute to simply the configuration)
<quorumd interval="5" label="cluster1qdisk" tko="3" votes="1" master_wins="1"/>
-
Increment the
config_versionand propagate the changes to the cluster -
To pick up the changes, the
qdiskddaemon or service should be restarted, which may require additional precautionary measures or a complete cluster restart) -
Observe if one node has the additional vote(s), and thus is the master to determine if the configuration was successful
Root Cause
With a two-node cluster, if the message ring between the two nodes is broken, the cluster cannot determine which node is failed and should be fenced, and which node is still working on its own, which can result in a "fence race".
master_wins mode is one method for avoiding fence races in two-node clusters, by having only one node in the cluster get the extra votes from the quorum device. The end result is that if communication between nodes is severed causing a membership transition, the master will remain quorate while the other node will not, making that "loser" node unable to fence the other.
Under normal conditions when both nodes are communicating, master_wins mode with both nodes at one vote and the quorum device at one vote would result in the following totals:
expected_votes = 3. This is the total number of votes in the cluster.- Votes required for
quorum = 2. This is the whole integer equal or greater toexpected_votes/ 2. - If node 1 is the
qdiskdmaster, it will have 3 votes (its own vote, the vote from node 2, quorum disk vote due to being the master) - If node 1 is the
qdiskdmaster, node 2 will have 2 votes (its own vote, vote from node 1, but no vote from the quorum disk because it is not the master).
If a membership split occurs and node 1 can no longer communicate with node 2:
- If node 1 is the
qdiskdmaster, it will have 2 votes (its own vote, and the quorum disk vote due to being the master). 2 votes is enough for quorum and so node 1 will be able to fence node 1 and retain or recover services. - If node 1 is the
qdiskdmaster, node 2 will have 1 vote (its own vote, but not the other node or the quorum disk). 1 vote will not provide quorum, so node 2 will not be able to fence and thus will lose any fence race.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.