Exploring RHEL High Availability's Components - corosync-qdevice and corosync-qnetd
Contents
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 7, 8 and 9 with the High Availability Add-On
Recommended Prior Reading
Useful Guides and References
- Explore components:
corosync - Explore concepts: quorum
- Design guidance - QDevice quorum-arbitration
- RHEL 7 High Availability Reference Guide - 10.5 Quorum Devices
- Deployment Examples - Enabling QDevice quorum-arbitration in RHEL 7
- Chapter 27. Configuring quorum devices Red Hat Enterprise Linux 8
- Chapter 27. Configuring quorum devices Red Hat Enterprise Linux 9
Introduction
corosync's qdevice and qnetd components provide a method for influencing quorum decisions in a RHEL High Availability cluster.
This guide aims to explore the implementation, features, behaviors, and technical details of these quorum-arbitration components. The guide is intended for organizations looking for a deeper understanding of these components so they can take full advantage of their capabilities and optimize their deployment in a cluster.
For readers looking for a simpler approach, check out the above references that dive more directly into examples and practical guidance for setting these components up. These below sections attempt to give more insight into the implementation and operation of the component, but are not meant to be procedural instructions - however the other references above attempt to offer guidance on that side as well.
Concepts
What is quorum?
Quorum is the condition of whether a node within a cluster has authority to carry out the functions of the cluster - interacting with shared resources, fencing nodes, etc.
The quorum policies of the cluster determine how each member will respond to membership changes. These quorum policies should always result in only one membership partition maintaining authority if the membership splits in any way.
- See also: Explore concepts: quorum
Is a majority-wins quorum policy good enough?
Any cluster that can potentially become "split" in its membership is at risk of being unable to serve the functions of the cluster. A simple majority-wins policy would not be good enough if a failure could leave all nodes without a majority.
- Example: If a cluster of three or more nodes can have all nodes lose contact with each other - such as in a cluster-wide network outage - then no single member would be left with a majority.
- Example: If a four node cluster is laid out with groupings of two nodes in separate locations, then the interconnection between the locations being severed could leave both sets of two nodes without a majority.
Even in situations where the quorum policy is modified to allow some nodes to continue without a majority, the group of nodes that is chosen to continue providing service may not be the most capable. Another set of nodes may have been healthier and better equipped to continue on, but the static quorum policies aren't able to tell the difference.
- Example: A four node cluster is laid out with groupings of two nodes in separate locations. The interconnection between the locations is severed, leaving two partitions consisting of two nodes each - neither one having a majority. Additionally, the network outage has taken out the client-facing network of one of the locations. If the quorum policy grants authority to that location's nodes, the application hosted by the cluster may still be entirely unreachable to clients. If the quorum policy had chosen the nodes with the functional client-network, the application would have stayed online.
Majority-wins can serve as a useful strategy if the cluster has a simple, single-site membership without much potential for complex failures. Otherwise, a more intelligent quorum-arbitration policy is almost always a useful addition to the cluster.
What does corosync-qnetd do?
corosync-qnetd provides a server application which runs on a host that is not a member of any cluster. This qnetd server provides a quorum arbitration service that cluster nodes can access over a network connection.
The primary function of this server is to offer an external point of connectivity for cluster nodes to coordinate their quorum decisions when unable to communicate amongst each other.
What does corosync-qdevice do?
corosync-qdevice provides a client application that runs on High Availability cluster members and integrates with corosync to help make quorum decisions. This qdevice process communicates with an external qnetd server to coordinate decision-making by the nodes of that cluster when they cannot communicate with each other.
How do corosync-qdevice and corosync-qnetd influence quorum?
Clusters utilizing corosync-qdevice start out the same way that a typical cluster would - the nodes start their corosync service, which communicates with other nodes to establish a membership between them, and that leads to a calculation of quorum based on how many nodes are part of that membership.
- See also: Explore components:
corosync - See also: Explore features: Member communication and heartbeat monitoring
- See also: Explore concepts: quorum
Each node of a cluster can then also run the corosync-qdevice service, which connects with the external corosync-qnetd server and maintains communications with it. This connection is used to relay membership information from each node to the corosync-qnetd server. The server is able to use this information from the nodes of the cluster to determine if there is a discrepancy in how they each view the cluster's membership.
While corosync-qdevice is communicating with corosync-qnetd, corosync-qdevice is also hooked into corosync's votequorum service on each node to participate in the quorum voting system. The corosync-qdevice configuration throughout the cluster specifies the algorithm that will influence the cluster's quorum decisions. According to that algorithm, a certain number of votes may be contributed to each node's local quorum calculations by the corosync-qdevice service.
The result is that quorum is no longer decided on the basis of a majority of nodes being present. Instead, quorum is based on how many nodes are present and whether the corosync-qdevice service is contributing votes to this node. The nodes of the cluster each use their connection to the corosync-qnetd server to decide how to contribute (or not contribute) votes to their quorum calculation.
What algorithms are available for arbitrating membership-splits with corosync-qdevice?
corosync-qdevice can be configured with one of a few algorithms to influence how it contributes votes to individual nodes throughout the cluster. The following algorithms are available for use:
-
lms(short for "last man standing") - Ensures that the largest membership partition in the cluster is the one to remain quorate - even if that is partition is the last node alive in the cluster. If there are multiple equal-sized partitions, then a partition is chosen as the quorum-winner on the basis of which one contains thetie_breakernode (see next section). -
ffsplit(short for "fifty fifty split") - Similar tolms,ffsplitwill target the largest membership partition in the cluster to remain quorate if there is a split. However the difference is thatffsplitcan then place a limit on how many additional node failures can be tolerated before all partitions are left without quorum. By default, half of the membership could remain quorate in a single partition, but any fewer nodes than that would leave the partition inquorate. -
NOTE: Technically there are also
algorithms2nodelmsandtest. These are intended only for developer use and should not be deployed in production environments. As such, they are not covered here.
How does corosync-qdevice use a "tie-breaker" to decide quorum?
The available algorithms have a few primary determining factors that they rely on when a membership split occurs - how large each partition is, and how many nodes are connected to corosync-qnetd . However there is a possibility that all partitions could be equal with respect to those factors, leaving a need for an additional deciding factor. It is important that there should only ever be a single partition in the cluster that can consider itself quorate - so if the primary factors can't decide that, something else must be able to.
corosync-qdevice offers a tie_breaker setting in its configuration, and this setting reflects which node of the cluster should serve as a tie-breaker. Whichever partition contains the indicated tie_breaker node will be the one to earn the votes to remain quorate.
The tie_breaker setting can either be "lowest", "highest", or it can be an integer. lowest would indicate that whichever partition contains the lowest node ID gets the votes from corosync-qdevice; highest means that the partition with the highest node ID should get the votes; and an integer would mean that a partition containing that exact node ID from the corosync configuration should get the votes.
What is a corosync-qdevice model?
The model in use by a node's corosync-qdevice service reflects the means by which it communicates with the qdevice arbitration target. "net" is the only available model at the current time - which, as its name hints, is an implementation of a network-based arbitration method.
corosync-qnetd is the server implementation of corosync-qdevice's "net" model.
Summary of how the pieces fit together
Nodes of a cluster start corosync, read their configuration, communicate with each other over the transport protocol, and form a membership.
Based on the nodes seen in the membership and configuration, each node calculates how many total votes are "expected" in the cluster. The minimum number of votes for quorum are calculated to be a majority of that expected-votes count. quorum = ceiling( expected-votes / 2 )
corosync-qnetd is started or already running on an external host to the cluster.
corosync-qdevice is started on each node of the cluster. It reads its model and configuration from /etc/corosync/corosync.conf, and uses that model to connect with the arbitration target.
Using the "net" model, corosync-qdevice on each node establishes a connection with the arbitration target - optionally using TLS, with optional client-certificate verification by the server.
The corosync-qnetd server receives information passed from each cluster node - the algorithm they're configured to use, what members they see as active, what their vote counts are, etc.
corosync-qnetd applies the algorithm to the information it has received from the cluster nodes, and instructs each node how many votes the "qdevice" is worth, and whether each node should count those "qdevice" votes.
corosync-qdevice on each node hooks into the corosync votequorum service to increase the total expected-votes by how many the "qdevice" is worth. The minimum value for quorum will often now be higher than it was before - meaning nodes will need to stay in good standing with the algorithm in order to maintain quorum.
Nodes maintain connectivity with the corosync-qnetd server, processing an "echo heartbeat" to maintain their alive state. If the heartbeat fails, any "qdevice" votes are subtracted from the vote counting on that node, possibly affecting quorum status.
If there is a membership change in the cluster, the nodes that are still in contact with the corosync-qnetd server relay their new membership lists, and again corosync-qnetd applies the algorithm. Each node may either receive the "qdevice" votes, or not receive them.
Those nodes that do not receive "qdevice" votes - either because they lost connectivity with the corosync-qnetd server, or because the algorithm decided against them - will not maintain quorum. Those nodes that do receive "qdevice" votes may remain quorate - depending on the algorithm, the membership state, and configuration.
The end result is that no more than one partition should remain quorate.
corosync-qnetd Component Explained
Activating a corosync-qnetd instance for clusters to use
A single instance of corosync-qnetd can serve as the quorum arbitrator for many different clusters using the net model.
To activate an instance of corosync-qnetd on a system that is not a cluster member, use pcs qdevice setup model net. This will configure the TLS certificates in the typical way and can optionally enable and start the instance.
Managing the corosync-qnetd service
It is recommended for administrators to start and stop corosync-qnetd via pcs qdevice start net and pcs qdevice stop net commands from the qnetd server, and to enable or disable with pcs qdevice enable net and pcs qdevice disable net. These commands interact with systemd to manage the corosync-qnetd.service unit.
This unit can be enabled to start on boot, or started manually with systemctl. This unit executes corosync-qnetd -f - running it in the foreground so that systemd can continue to monitor. systemd will restart the process if it exits abnormally.
The running process can be seen as corosync-qnetd:
# ps aux | grep -e COMMAND -e corosync-qnetd | grep -v grep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
coroqne+ 31384 0.0 0.0 42720 8740 ? Ss Jul31 0:08 /usr/bin/corosync-qnetd -f
corosync-qnetd configuration
pcs qdevice setup model net does not directly accept any configuration options, and will set the qnetd instance up with typical settings - TLS being enabled, and certificates being required.
If any advanced configuration is required, it needs to be implemented in /etc/sysconfig/corosync-qnetd manually. The daemon reads the options it should be started with from environment variable COROSYNC_QNETD_OPTIONS - which can be set in this /etc/sysconfig/corosync-qnetd configuration file. The options that the daemon will accept are detailed in its man page corosync-qnetd(8).
The only few options that are typically needing consideration are security-related:
-s Determines if TLS should be used and can be one of on/off/required (the default is on ). on means TLS is enabled but the client is not required to start TLS, off means TLS is completely disabled, and required means TLS is required. on and required require the NSS database to be properly initialized by running the corosync-qnetd-certutil command.
-c can be set to on/off. This option only makes sense if TLS is enabled. When -c is on a client is required to send its client certificate (default).
corosync-qnetd TLS configuration
corosync-qnetd can serve encrypted connections with its clients - which is the recommended mode of operation. See the above section's description of option -s and -c for details.
corosync-qnetd utilizes an NSS database stored in /etc/corosync/qnetd to manage certificates and verification of clients. The corosync-qnetd package comes with a convenient corosync-qnetd-certutil to assist in managing this database, if needed. This utility can initialize a certificate authority and server certificate which can then be used to sign client certificates.
In most environments, manual interaction with corosync-qnetd-certutil is unnecessary. pcs qdevice setup model net from the qnetd server will handle calling this utility to set everything up. Administrators should only need to work with these certificates and requests if they need more control over the issuing and signing of certificates or the CA that should sign those certificates.
corosync-qnetd user
It is recommended to run corosync-qnetd as a non-root user, since it does not require special root privileges. By default this command will execute as the coroqnetd user. The user to run it as is read by the systemd unit via the COROSYNC_QNETD_RUNAS environment variable - which can be configured via /etc/sysconfig/corosync-qnetd.
This user must have permissions to access /etc/corosync/qnetd. Changing the corosync-qnetd's owner/group to that user can address this.
The user must also have permissions to access /var/run/corosync-qnetd. The latter directory is typically on a tmpfs file system that is discarded and recreated with each reboot by systemd. If this is the case - which it would be on typical RHEL 7 systems - then a configuration file can be created in /etc/tmpfiles.d to set these permissions. corosync-qnetd automatically deploys such a config file to /usr/lib/tmpfiles.d/corosync-qnetd.conf and is configured to set permissions for coroqnetd, so that file can be copied to /etc/tmpfiles.d and adjusted for the appropriate user. The /etc/tmpfiles.d configuration will override the /usr/lib/tmpfiles.d version.
Observing corosync-qnetd status
pcs qdevice status net from a corosync-qnetd server can display status information about the running "model" - which is "net" if using corosync-qnetd. Internally this command is simply executing corosync-qnetd-tool with options -l and -s and and passing along the results.
# pcs qdevice status net
QNetd address: *:5403
TLS: Supported (client certificate required)
Connected clients: 4
Connected clusters: 1
Cluster "rhel7-cluster":
Algorithm: Fifty-Fifty split
Tie-breaker: Node with lowest node ID
Node ID 4:
Client address: ::ffff:10.10.181.121:32772
Configured node list: 1, 2, 3, 4
Membership node list: 1, 3, 4
Vote: No change (ACK)
Node ID 2:
Client address: ::ffff:10.10.182.108:60584
Configured node list: 1, 2, 3, 4
Membership node list: 2
Vote: NACK (NACK)
Node ID 3:
Client address: ::ffff:10.10.182.144:60938
Configured node list: 1, 2, 3, 4
Membership node list: 1, 3, 4
Vote: No change (ACK)
Node ID 1:
Client address: ::ffff:10.10.181.105:57044
Configured node list: 1, 2, 3, 4
Membership node list: 1, 3, 4
Vote: No change (ACK)
Observing corosync-qnetd listening socket
netstat can show the listening socket that the corosync-qnetd server process has open to receive connections from corosync-qdevice clients. This may be useful for troubleshooting problems with nodes not being able to contact their corosync-qnetd host, or other unexpected behavior.
# netstat -tunap | grep corosync-q
tcp6 0 0 :::5403 :::* LISTEN 31384/corosync-qnet
tcp6 0 0 10.10.180.141:5403 10.10.181.105:57740 ESTABLISHED 31384/corosync-qnet
tcp6 0 0 10.10.180.141:5403 10.10.181.121:33384 ESTABLISHED 31384/corosync-qnet
tcp6 0 0 10.10.180.141:5403 10.10.182.144:33328 ESTABLISHED 31384/corosync-qnet
tcp6 0 0 10.10.180.141:5403 10.10.182.108:53076 ESTABLISHED 31384/corosync-qnet
The open LISTEN socket here on ":::5403" represents the state the server should be in to receive connections. The other four lines represent "ESTABLISHED" connections with client systems - which indicates those nodes should be able to send and receive information in connection with this server.
corosync-qdevice Component Explained
Activating the corosync-qdevice service for a cluster
Administrators can configure and activate corosync-qdevice for service through pcs quorum device add [generic options] model net [net configuration]. This activation step will need to be executed while the cluster is completely stopped - which the command will inform the user of if it finds the cluster running.
This quorum device add command passes the appropriate quorum configuration and any specified options along to the /etc/corosync/corosync.conf quorum section.
For environments that may need to specify advanced options to the corosync-qdevice daemon - those listed in the corosync-qdevice(8) man page - these can be entered in /etc/sysconfig/corosync-qdevice. The corosync-qdevice service will read the COROSYNC_QDEVICE_OPTIONS environment variable when it starts, which can be configured via this file.
Managing the corosync-qdevice service
Administrators can start and stop corosync-qdevice on cluster nodes along with the other important cluster services using pcs cluster start and pcs cluster stop. corosync-qdevice can also be enabled or disabled to start on booth with the rest via pcs cluster enable and pcs cluster disable.
Internally these are managing the corosync-qdevice service through its corosync-qdevice.service systemd unit. This unit can start and stop the service, or enable or disable it to start automatically during boot. This service is managed by pcs in its start/stop/enable/disable activities.
corosync-qdevice service configuration
The corosync-qdevice service can be started with various options, which are explained in the corosync-qdevice(8) man page. These options are read from the command line, and the systemd unit can be configured to pass these in with each start of the service using the environment variable COROSYNC_QDEVICE_OPTIONS in /etc/sysconfig/corosync-qdevice. These options can be passed to pcs quorum device add [options] model [configuration] when the service is being activated initially.
corosync-qdevice quorum and daemon configuration
The quorum-influencing behavior enacted by corosync-qdevice is configured through the primary corosync configuration - /etc/corosync/corosync.conf. The settings available for configuration are listed in the CONFIGURATION section of the corosync-qdevice(8) man page.
These settings can be adjusted in the [configuration] portion of the pcs quorum device add [options] model net [configuration] command.
The quorum configuration for corosync-qdevice goes in /etc/corosync/corosync.conf's quorum.device (a top-level quorum block of that file, containing a device block). Within device the model should be specified, and then that model has its own block of settings. For example:
quorum {
provider: corosync_votequorum
device {
model: net
net {
algorithm: lms
host: qnetd.internal.example.com
}
}
}
corosync-qdevice TLS configuration
The corosync-qdevice service can interact with its qnetd server over a TLS connection - which is the recommended mode of operation. corosync-qdevice's net model accepts a tls configuration setting that specifies whether TLS is on, off, or required - with "on" meaning we'll attempt to start TLS with the server but it is not required. In most cases "on" is adequate - and then the server can just be configured to either enforce or not enforce TLS.
As noted in the corosync-qnetd Component Explained section above, the corosync-qnetd server has an NSS database set up for it when it is initially activated. This database is configured to provide a certificate authority that can then sign client certificates with its own key.
If client verification will be enforced - which is a configurable setting for corosync-qnetd - then each client will need to generate a certificate-signing-request to then be signed by the CA. That signed certificate then gets loaded by corosync-qdevice each time it starts, passes it along to the server to start the TLS session, and that can verify its identity.
The corosync-qdevice package provides a convenient utility called corosync-qdevice-net-certutil which can take care of the corosync-qdevice parts of that process - generating a certificate signing request, sending it to the qnetd server for signing, and putting in place the resulting certificate for use on future startups.
In typical environments, administrators should not need to do anything with certificates on the cluster nodes. corosync-qdevice-net-certutil is called automatically to do this work by pcs quorum device add [options] model net [configuration].
Only environments where the administrator needs greater control over the certificates or the signing of them should need some level of interaction with corosync-qdevice-net-certutil or the certificates. The corosync-qdevice-net-certutil help output is useful for understanding how this tool can be used.
Observing a cluster's quorum status
The quorum status of a cluster can be observed using pcs quorum status, which internally calls corosync-quorumtool and passes along the result.
# pcs quorum status
Quorum information
------------------
Date: Tue Aug 1 19:10:35 2017
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 1
Ring ID: 1/1507432
Quorate: Yes
Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 7
Quorum: 4
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW rhel7-node1.example.com (local)
2 1 A,V,NMW rhel7-node2.example.com
3 1 A,V,NMW rhel7-node3.example.com
4 1 A,V,NMW rhel7-node4.example.com
0 3 Qdevice
The "Votequorum information" section has useful information about what the number of votes seen is, and what the minimum number of votes for quorum are. The "Membership information" section can be useful for observing the current membership list as it is seen by corosync`.
Observing corosync-qdevice status
pcs quorum device status from a cluster node can display status information about the running corosync-qdevice service on that node - which internally calls corosync-qdevice-tool -s and passes along the result.
# pcs quorum device status
Qdevice information
-------------------
Model: Net
Node ID: 1
Configured node list:
0 Node ID = 1
1 Node ID = 2
2 Node ID = 3
3 Node ID = 4
Membership node list: 1, 2, 3, 4
Qdevice-net information
----------------------
Cluster name: rhel7-cluster
QNetd host: qnetd.internal.example.com:5403
Algorithm: LMS
Tie-breaker: Node with lowest node ID
State: Connected
Observing corosync-qdevice connection with corosync-qnetd server
netstat can show the connection that each node has open with the corosync-qnetd server. This may be useful for troubleshooting problems with a node not receiving votes, or other unexpected behavior.
# netstat -tunap | grep -e Address -e corosync-q
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 10.10.181.105:57740 10.10.180.141:5403 ESTABLISHED 2272/corosync-qdevi
The foreign address would be the corosync-qnetd server. This record shows the connection "ESTABLISHED" - any other result may indicate a problem with connectivity.