Design Guidance for RHEL High Availability Clusters - Considerations with qdevice Quorum Arbitration

Updated

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) 9 with the High Availability Add-On
  • Red Hat Enterprise Linux (RHEL) 8 with the High Availability Add-On
  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On

Useful References and Guides

Introduction

This guide provides Red Hat's recommendations, considerations, and essential references and knowledge for deploying a qdevice for quorum arbitration in a RHEL High Availability cluster. The covered topics can be useful if you are considering if corosync-qdevice's features are needed for reliability of your cluster, or if you are trying to decide on the configuration that achieves the failure-recovery requirements and goals of your cluster.

Deciding Whether to use qdevice Quorum Arbitration

Is the cluster compatible with a QDevice?

For details on supported conditions, see Support policies - corosync-qdevice and corosync-qnetd.

  • Supported releases:

    • RHEL 8: corosync-qdevice and corosync-qnetd are supported by Red Hat
    • RHEL 7: Supported as of RHEL 7 Update 4. corosync-qdevice and corosync-qnetd not supported in earlier RHEL 7 updates.
    • RHEL 6: Not available or supported in RHEL 6. Consider updating to RHEL 7.
  • Cluster memberships/configurations:

    • All membership layouts should be able to accomodate a QDevice.
    • A separate server must be available to host corosync-qnetd. It cannot be a member of any RHEL High Availability cluster. This one host can service multiple clusters.

Deployments where Red Hat recommends use of a QDevice

NOTE: There may be other important factors to consider in the following scenarios - these are just general recommendations. When looking at a a configuration in general and trying to weigh QDevice vs no-QDevice, these configurations are good candidates for using a QDevice.

  • Any mission-critical cluster that cannot tolerate any loss of service, and/or does not have any disaster-recovery plan in place behind it.
  • A cluster of 3 or fewer nodes, or any cluster that may experience a loss of half or more of its nodes.
  • A single cluster with members spread across multiple sites, or separate network infrastructures.

Benefits of a QDevice

  • Offers increased confidence in a cluster's ability to continue providing service through multiple node failures.
  • Provides ability for a cluster to continue serving its functions through wide-scale or complete failure of the cluster's interconnect network.
  • Allows for the cluster to intelligently determine which nodes should continue providing service based on their external connectivity - useful for ensuring a cluster is protecting the nodes which can stay in contact with clients.

Potential negative considerations against a QDevice

  • Requires an additional server to host the network arbitration service - in addition to the core cluster members.
  • Does not offer any ability to make quorum decisions on a more arbitrary basis - such as using connectivity to a random host, or connectivity to a storage device with its own split-arbitration method.
  • If using across multiple sites, it makes the most sense to deploy in a third neutral location. If no such third site exists, or if the network layout does not facilitate connectivity between three sites - deploying a QDevice in a useful way can be challenging.
  • Only supported in later releases of RHEL 7 High Availability

Designing a corosync-qnetd Server and Configuration

General goals of the corosync-qnetd's server and environment design

  • Remain operational through as many failure scenarios as possible - especially those that may affect the cluster nodes.
  • Maintain connectivity with cluster nodes being served even when they can't communicate with each other.
  • Host its service from a network that represents a meaningful target of the cluster's hosted functions - e.g., be on the same network as clients, or on the same network hosting services that consume the cluster's services.

Redundancy/isolation from cluster nodes that will be served

A QDevice provides most of its benefit in situations where a cluster is experiencing a disconnect between its members, or when the membership is degraded in some way. During these times, the QDevice can only achieve its designed goals if the cluster nodes are able to communicate with the corosync-qnetd server.

It is important that the corosync-qnetd server be able to survive failure scenarios that may typically affect the nodes of a cluster.

  • If a power outage or physical event may disrupt some nodes, the corosync-qnetd server should remain online. There the corosync-qnetd server should have a separate power source from all nodes, and ideally be hosted in a separate physical infrastructure - a different rack, or another facility entirely.
  • If a network problem may disrupt connectivity between all nodes, those nodes need to still be able to contact the corosync-qnetd server. Therefore the corosync-qnetd's network communications should be redundant against the cluster's interconnect (see below for further guidance).
  • If the members of a cluster are spread across multiple sites, locations, or network infrastructures, then nodes need to remain in contact with the corosync-qnetd server if one site is lost, or if the link between sites is severed. Therefore it is important that the corosync-qnetd server be located in a separate, neutral location for full protection in failure scenarios.

Network connectivity of corosync-qnetd server

Nodes need to be able to communicate with corosync-qnetd in order to make decisions about quorum during failure scenarios. This is most important when nodes of the cluster can't communicate with each other. Design the network with redundancy in the following ways can greatly improve the fault-tolerance of the cluster:

  • The corosync-qnetd service is hosted on a network separate from any cluster's interconnect network. If the cluster nodes lose their ability to communicate with each other due to a network outage between them, this arrangement decreases the likelihood of the corosync-qnetd server also being unavailable.
  • The corosync-qnetd service is hosted on an interface that is bonded or teamed across links to redundant switches. Maintaining connectivity between a cluster and its corosync-qnetd helps prevent losses of quorum if any nodes of the cluster are unable to participate in the membership.

Operating system deployment of corosync-qnetd server

The corosync-qnetd server really only needs to be capable of running this one application that is available in RHEL 7 Update 4 and later. Its operating system deployment can be of minimal design, only needing to host this one application and anything otherwise required by the organization.


Designing the corosync-qdevice Configuration of a Cluster

Which model should I use?

net is the only model currently available in RHEL High Availability clusters, so there is no decision to make here.


Which algorithm should I use?

The choice is between lms and ffsplit. The primary difference between them is that an lms-based QDevice holds great voting power in the cluster (total expected votes - 1), whereas a ffsplit-based QDevice only represents a single additional vote to try to sway quorum decisions one way or the other when there is an even split.

Consider the following when choosing an algorithm:

  • Even vs odd-sized membership: ffsplit only makes sense with an even number of nodes that are arranged in a way in which they might split evenly in their membership. If using an odd number of nodes, lms is the better choice. If using an even number of nodes but the nodes are distributed in a way that makes an even split unlikely, then lms is the better fit.
  • ffsplit cluster more likely to survive corosync-qnetd server loss: Clusters using ffsplit are less susceptible to losing quorum throughout their membership if the corosync-qnetd server cannot be reached. Such a cluster would typically have to experience multiple node failures and a disconnect with the corosync-qnetd server in order to fully lose quorum everywhere. If the reliability of the corosync-qnetd server or its network connectivity is in doubt, then ffsplit might be the better choice for algorithm.
  • lms cluster more likely to remain operational if corosync-qnetd connection can be maintained: Clusters using lms have the ability to continue functioning down to even a single node being alive - as long as one node can maintain a connection to the corosync-qnetd server. If the corosync-qnetd server and its network are reliable, lms can be a better choice if cluster service must be maintained even in widespread failures of nodes or the cluster interconnect network.
  • Two-node clusters: the two algorithms are essentially the same in two-node cluster environments. In both cases the QDevice gets 1 vote, so they have the same reliability from losses of the qnetd server, and they both can allow one node to survive the failure of the other node as long as the QDevice is active and accessible. So, for the sake of recommending one for consistency: choose lms with two nodes.

Choosing a tie_breaker node

The tie_breaker setting of a cluster's QDevice can be either lowest (the default), highest, or a specific node ID. If the cluster membership splits and the algorithm is not able to identify any partition as a better candidate based on its size or connectivity, then which partition contains a member matching this rule will be chosen to maintain the QDevice votes. lowest and highest mean that whichever partition contains the lowest or highest node ID of the nodes connected to the corosync-qnetd server will win the tie.

Using a specific node ID as the tie_breaker if often not an ideal choice in clusters larger than two nodes, because this limits the decision from being tie-broken if that node is not alive or connected to the corosync-qnetd server. In other words, that specified node going missing may prevent any partition from getting votes in some scenarios.

So in most cases, that leaves lowest and highest as possible choices. In many environments, the choice may not matter one way or the other - where all that matters is that some partition is chosen and can continue to provide service.

In situations where there are (or can be) assigned preferences for important resources to run on a specific node or nodes, then it may be ideal to align the tie_breaker decision with those preferences. That is, if the primary purpose of the cluster will be hosted on a single node - then it may be useful to have ties decided in the direction of that node, to avoid unnecessarily moving that hosted service around the cluster after a membership split. There is no direct way to have the QDevice choose a node based on where a resource is running, but the tie_breaker setting can be chosen with consideration of where resources are preferred to run. For example:

  • The most important resources of the cluster can be configured to preferred node ID 1, then 2, then 3, then 4... using pacemaker location constraints with scores assigned to those respective nodes in descending order. E.g. resource on node ID 1 with score 1000, resource on node ID 2 with score 500, resource on node ID 3 with score 400. resource on node ID 4 with score 300.
  • The tie_breaker can be set to lowest (or left at the default setting of lowest) so that if there is a tie, it will be chosen in the preferred direction of the resource.
  • An even split between nodes 1 and 2 vs nodes 3 and 4 that results in an algorithm tie would be chosen in favor of nodes 1 and 2 - which are the nodes most likely to be running the resource, based on location constraints.

If the cluster is already designed with such resource preferences, then choosing lowest vs highest on those grounds may make sense. If the cluster is not designed in any such preferred ordering, then it can be useful to consider if it would provide benefit in failure-scenarios to avoid unnecessary resource movement.

If the resources are quick to move or recover, or no single nodes' resources are going to be more important than the rest - then the tie_breaker setting may not matter much.


Deployment and Administration Guidance

Deployment examples

Article Type