Design Guidance for RHEL High Availability Clusters - Considerations with qdevice Quorum Arbitration
Contents
- Overview
- Deciding Whether to use QDevice Quorum Arbitration
- Designing a corosync-qnetd Server and Configuration
- Designing the corosync-qdevice Configuration of a Cluster
- Deployment and Administration Guidance
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 9 with the High Availability Add-On
- Red Hat Enterprise Linux (RHEL) 8 with the High Availability Add-On
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
Recommended Prior Reading
Useful References and Guides
- Explore components:
corosync-qdeviceandcorosync-qnetd - RHEL 7 High Availability Reference Guide - 10.5 Quorum Devices
Introduction
This guide provides Red Hat's recommendations, considerations, and essential references and knowledge for deploying a qdevice for quorum arbitration in a RHEL High Availability cluster. The covered topics can be useful if you are considering if corosync-qdevice's features are needed for reliability of your cluster, or if you are trying to decide on the configuration that achieves the failure-recovery requirements and goals of your cluster.
Deciding Whether to use qdevice Quorum Arbitration
Is the cluster compatible with a QDevice?
For details on supported conditions, see Support policies - corosync-qdevice and corosync-qnetd.
-
Supported releases:
- RHEL 8:
corosync-qdeviceandcorosync-qnetdare supported by Red Hat - RHEL 7: Supported as of RHEL 7 Update 4.
corosync-qdeviceandcorosync-qnetdnot supported in earlier RHEL 7 updates. - RHEL 6: Not available or supported in RHEL 6. Consider updating to RHEL 7.
- RHEL 8:
-
Cluster memberships/configurations:
- All membership layouts should be able to accomodate a QDevice.
- A separate server must be available to host
corosync-qnetd. It cannot be a member of any RHEL High Availability cluster. This one host can service multiple clusters.
Deployments where Red Hat recommends use of a QDevice
NOTE: There may be other important factors to consider in the following scenarios - these are just general recommendations. When looking at a a configuration in general and trying to weigh QDevice vs no-QDevice, these configurations are good candidates for using a QDevice.
- Any mission-critical cluster that cannot tolerate any loss of service, and/or does not have any disaster-recovery plan in place behind it.
- A cluster of 3 or fewer nodes, or any cluster that may experience a loss of half or more of its nodes.
- A single cluster with members spread across multiple sites, or separate network infrastructures.
Benefits of a QDevice
- Offers increased confidence in a cluster's ability to continue providing service through multiple node failures.
- Provides ability for a cluster to continue serving its functions through wide-scale or complete failure of the cluster's interconnect network.
- Allows for the cluster to intelligently determine which nodes should continue providing service based on their external connectivity - useful for ensuring a cluster is protecting the nodes which can stay in contact with clients.
Potential negative considerations against a QDevice
- Requires an additional server to host the network arbitration service - in addition to the core cluster members.
- Does not offer any ability to make quorum decisions on a more arbitrary basis - such as using connectivity to a random host, or connectivity to a storage device with its own split-arbitration method.
- If using across multiple sites, it makes the most sense to deploy in a third neutral location. If no such third site exists, or if the network layout does not facilitate connectivity between three sites - deploying a QDevice in a useful way can be challenging.
- Only supported in later releases of RHEL 7 High Availability
Designing a corosync-qnetd Server and Configuration
General goals of the corosync-qnetd's server and environment design
- Remain operational through as many failure scenarios as possible - especially those that may affect the cluster nodes.
- Maintain connectivity with cluster nodes being served even when they can't communicate with each other.
- Host its service from a network that represents a meaningful target of the cluster's hosted functions - e.g., be on the same network as clients, or on the same network hosting services that consume the cluster's services.
Redundancy/isolation from cluster nodes that will be served
A QDevice provides most of its benefit in situations where a cluster is experiencing a disconnect between its members, or when the membership is degraded in some way. During these times, the QDevice can only achieve its designed goals if the cluster nodes are able to communicate with the corosync-qnetd server.
It is important that the corosync-qnetd server be able to survive failure scenarios that may typically affect the nodes of a cluster.
- If a power outage or physical event may disrupt some nodes, the
corosync-qnetdserver should remain online. There thecorosync-qnetdserver should have a separate power source from all nodes, and ideally be hosted in a separate physical infrastructure - a different rack, or another facility entirely. - If a network problem may disrupt connectivity between all nodes, those nodes need to still be able to contact the
corosync-qnetdserver. Therefore thecorosync-qnetd's network communications should be redundant against the cluster's interconnect (see below for further guidance). - If the members of a cluster are spread across multiple sites, locations, or network infrastructures, then nodes need to remain in contact with the
corosync-qnetdserver if one site is lost, or if the link between sites is severed. Therefore it is important that thecorosync-qnetdserver be located in a separate, neutral location for full protection in failure scenarios.
Network connectivity of corosync-qnetd server
Nodes need to be able to communicate with corosync-qnetd in order to make decisions about quorum during failure scenarios. This is most important when nodes of the cluster can't communicate with each other. Design the network with redundancy in the following ways can greatly improve the fault-tolerance of the cluster:
- The
corosync-qnetdservice is hosted on a network separate from any cluster's interconnect network. If the cluster nodes lose their ability to communicate with each other due to a network outage between them, this arrangement decreases the likelihood of thecorosync-qnetdserver also being unavailable. - The
corosync-qnetdservice is hosted on an interface that is bonded or teamed across links to redundant switches. Maintaining connectivity between a cluster and itscorosync-qnetdhelps prevent losses of quorum if any nodes of the cluster are unable to participate in the membership.
Operating system deployment of corosync-qnetd server
The corosync-qnetd server really only needs to be capable of running this one application that is available in RHEL 7 Update 4 and later. Its operating system deployment can be of minimal design, only needing to host this one application and anything otherwise required by the organization.
Designing the corosync-qdevice Configuration of a Cluster
Which model should I use?
net is the only model currently available in RHEL High Availability clusters, so there is no decision to make here.
Which algorithm should I use?
The choice is between lms and ffsplit. The primary difference between them is that an lms-based QDevice holds great voting power in the cluster (total expected votes - 1), whereas a ffsplit-based QDevice only represents a single additional vote to try to sway quorum decisions one way or the other when there is an even split.
Consider the following when choosing an algorithm:
- Even vs odd-sized membership:
ffsplitonly makes sense with an even number of nodes that are arranged in a way in which they might split evenly in their membership. If using an odd number of nodes,lmsis the better choice. If using an even number of nodes but the nodes are distributed in a way that makes an even split unlikely, thenlmsis the better fit. ffsplitcluster more likely to survivecorosync-qnetdserver loss: Clusters usingffsplitare less susceptible to losing quorum throughout their membership if thecorosync-qnetdserver cannot be reached. Such a cluster would typically have to experience multiple node failures and a disconnect with thecorosync-qnetdserver in order to fully lose quorum everywhere. If the reliability of thecorosync-qnetdserver or its network connectivity is in doubt, thenffsplitmight be the better choice foralgorithm.lmscluster more likely to remain operational ifcorosync-qnetdconnection can be maintained: Clusters usinglmshave the ability to continue functioning down to even a single node being alive - as long as one node can maintain a connection to thecorosync-qnetdserver. If thecorosync-qnetdserver and its network are reliable,lmscan be a better choice if cluster service must be maintained even in widespread failures of nodes or the cluster interconnect network.- Two-node clusters: the two algorithms are essentially the same in two-node cluster environments. In both cases the QDevice gets 1 vote, so they have the same reliability from losses of the
qnetdserver, and they both can allow one node to survive the failure of the other node as long as the QDevice is active and accessible. So, for the sake of recommending one for consistency: chooselmswith two nodes.
Choosing a tie_breaker node
The tie_breaker setting of a cluster's QDevice can be either lowest (the default), highest, or a specific node ID. If the cluster membership splits and the algorithm is not able to identify any partition as a better candidate based on its size or connectivity, then which partition contains a member matching this rule will be chosen to maintain the QDevice votes. lowest and highest mean that whichever partition contains the lowest or highest node ID of the nodes connected to the corosync-qnetd server will win the tie.
Using a specific node ID as the tie_breaker if often not an ideal choice in clusters larger than two nodes, because this limits the decision from being tie-broken if that node is not alive or connected to the corosync-qnetd server. In other words, that specified node going missing may prevent any partition from getting votes in some scenarios.
So in most cases, that leaves lowest and highest as possible choices. In many environments, the choice may not matter one way or the other - where all that matters is that some partition is chosen and can continue to provide service.
In situations where there are (or can be) assigned preferences for important resources to run on a specific node or nodes, then it may be ideal to align the tie_breaker decision with those preferences. That is, if the primary purpose of the cluster will be hosted on a single node - then it may be useful to have ties decided in the direction of that node, to avoid unnecessarily moving that hosted service around the cluster after a membership split. There is no direct way to have the QDevice choose a node based on where a resource is running, but the tie_breaker setting can be chosen with consideration of where resources are preferred to run. For example:
- The most important resources of the cluster can be configured to preferred node ID 1, then 2, then 3, then 4... using
pacemakerlocation constraints with scores assigned to those respective nodes in descending order. E.g. resource on node ID 1 with score 1000, resource on node ID 2 with score 500, resource on node ID 3 with score 400. resource on node ID 4 with score 300. - The
tie_breakercan be set tolowest(or left at the default setting oflowest) so that if there is a tie, it will be chosen in the preferred direction of the resource. - An even split between nodes 1 and 2 vs nodes 3 and 4 that results in an
algorithmtie would be chosen in favor of nodes 1 and 2 - which are the nodes most likely to be running the resource, based on location constraints.
If the cluster is already designed with such resource preferences, then choosing lowest vs highest on those grounds may make sense. If the cluster is not designed in any such preferred ordering, then it can be useful to consider if it would provide benefit in failure-scenarios to avoid unnecessary resource movement.
If the resources are quick to move or recover, or no single nodes' resources are going to be more important than the rest - then the tie_breaker setting may not matter much.
Deployment and Administration Guidance
Deployment examples