Deployment Examples for RHEL High Availability Clusters - Enabling QDevice Quorum Arbitration in RHEL 7
Contents
- Overview
- Prepare the
corosync-qnetdenvironment - Enable corosync-qnetd on quorum arbitration server
- Prepare the cluster environment to use corosync-qdevice
- Enable QDevice quorum-arbitration in cluster
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
Recommended Prior Reading
Useful References and Guides
- Explore components:
corosync-qdeviceandcorosync-qnetd - Design guidance: QDevice Quorum-Arbitration
- RHEL 7 High Availability Reference Guide - 10.5 Quorum Devices
Introduction
This article focuses on deploying a QDevice quorum arbitration method for a RHEL 7 High Availability cluster by way of corosync-qdevice and corosync-qnetd.
Red Hat's series of Deployment Examples for RHEL High Availability Clusters aims to deliver simple and direct instruction for configuring common use cases using typical or recommended settings. There may be additional features, options, and decisions relevant to the targeted components that organizations may wish to consider for their deployments, whereas this guide can serve as a helpful reference for understanding how to get up and running with typical settings quickly. The above references and guides may be a useful starting point to understand these components and the High Availability technologies in more detail.
Prepare the corosync-qnetd environment
Prerequisites for deploying corosync-qnetd
- A server that is not a member of any cluster, that is able to have RHEL 7 Update 4 deployed on it
- See Support policies -
corosync-qdeviceandcorosync-qnetdfor details
- See Support policies -
- There is connectivity between this server and the cluster nodes it will serve, ideally over an alternate network from any cluster's sole interconnect between nodes
- See Design guidance - QDevice Quorum-Arbitration for additional guidance
Prepare corosync-qnetd server with typical cluster-preparation steps
Although the corosync-qnetd server won't be a cluster member, it will require following many of the same steps to prepare a server to participate in a cluster. Follow these steps:
- See Preparing Servers for Inclusion in a RHEL 7 Cluster
- NOTE: Only the
pcspackage needs to be installed during the software installation steps of that procedure. All others listed are unnecessary for thiscorosync-qnetdserver. - NOTE: It is not necessary to open the firewall ports as that procedure describes - we need fewer ports on this
corosync-qnetdserver, so we will carry out that step below with modified parameters.
- NOTE: Only the
Install corosync-qnetd software on quorum arbitration server
On the system that will be the arbitration server (not a member of any cluster membership), install corosync-qnetd:
# yum install corosync-qnetd
Allow access to corosync-qnetd and pcsd through firewall
If using a firewall between the cluster nodes and this system, access to two services will need to be enabled. By default corosync-qnetd listens on port 5403, and pcsd listens on 2224.
NOTE: This step will vary for all environments, and security practices must be decided individually by every organization according to their own needs. An example for a typical RHEL deployment that would use bond0 for connectivity to corosync-qnetd is:
# firewall-cmd --get-active-zones
public
interfaces: bond0
# firewall-cmd --add-port=5403/tcp --zone=public --permanent
# firewall-cmd --add-port=2224/tcp --zone=public --permanent
# firewall-cmd --reload
Enable corosync-qnetd on quorum arbitration server
Setup, enable, and start corosync-qnetd using standard settings
# pcs qdevice setup model net --enable --start
Note: This will deploy the corosync-qnetd server with a self-signed certificate for TLS connections, and will set up a certificate-authority to sign certificate requests from cluster nodes that will use this corosync-qnetd service. If needing more control over these security and authentication mechanisms, see corosync-qnetd-cerutil's manpage and help output, and consider running that command manually, or otherwise implementing these components through manual efforts.
Confirm corosync-qnetd service is active
# pcs qdevice status net
QNetd address: *:5403
TLS: Supported (client certificate required)
Any error or other indication that corosync-qnetd is not operational should be cause to stop and investigate.
Prepare the cluster environment to use corosync-qdevice
Prerequisites for deploying corosync-qdevice
- Cluster members must all be able to run RHEL 7 Update 4 in order to utilize
corosync-qdevice- See Support policies -
corosync-qdeviceandcorosync-qnetdfor details
- See Support policies -
(If deploying a new cluster) - Deploy cluster servers, install operating system, and setup cluster
- Physically prepare, rack, and connect the servers that will serve as cluster members
- Install RHEL 7 Update 4 or later
- See also RHEL 7 Installation Guide
- If updating an existing cluster, see Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster
- Prepare servers to participate in cluster
- Set up RHEL 7 cluster
Install corosync-qdevice software on all nodes
# yum install corosync-qdevice
Authorize cluster node(s) to corosync-qnetd server's pcsd service
From all nodes of the cluster - or alternatively just on the node where pcs quorum commands will be executed in subsequent steps - authorize to the pcsd service on the corosync-qnetd server:
# pcs cluster auth -u hacluster
Enable QDevice quorum-arbitration in cluster
Configure and activate corosync-qdevice service in cluster
On one node of the cluster, use pcs to configure and activate corosync-qdevice. Information that will need to be in hand for this task:
- The hostname or IP address of the
corosync-qnetdserver - this should be the address mapping to the interface that was chosen for communications betweencorosync-qnetdand cluster members - The
algorithmthat this cluster's QDevice should operate under
# # Syntax: # pcs quorum device add model net host=<corosync-qnetd host> algorithm=<algorithm>
# # Example:
# pcs quorum device add model net host=qnetd.example.com algorithm=lms
Check status of QDevice in cluster
Use pcs to review the quorum configuration that was deployed
# pcs quorum config
Options:
Device:
Model: net
algorithm: lms
host: qnetd.example.com
Check the status of quorum with pcs
# pcs quorum status
Quorum information
------------------
Date: Thu Aug 10 10:49:20 2017
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 1
Ring ID: 1/1507560
Quorate: Yes
Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 7
Quorum: 4
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW rhel7-node1.example.com (local)
2 1 A,V,NMW rhel7-node2.example.com
3 1 A,V,NMW rhel7-node3.example.com
4 1 A,V,NMW rhel7-node4.example.com
0 3 Qdevice
The important items to look for here are:
- The flags under
Votequorum Informationshow "Qdevice"
Flags: Quorate Qdevice
Expected Votesis more than the total number of nodes in the cluster. With thelmsalgorithm, this should be#nodes + (#nodes - 1). Withffsplit, this should be#nodes + 1. In this example four node cluster usinglms, expected votes is now7 = 4 nodes + 3 for Qdevice
Expected votes: 7
- The
Membership Informationshows each node of the cluster having aVcharacter in itsQdevicecolumn - indicating it has a vote. If any node is missing a vote, it may not be in contact with thecorosync-qnetdserver
Nodeid Votes Qdevice Name
1 1 A,V,NMW rhel7-node1.example.com (local)
2 1 A,V,NMW rhel7-node2.example.com
3 1 A,V,NMW rhel7-node3.example.com
4 1 A,V,NMW rhel7-node4.example.com
- The
Membership Informationshows theQdevicelisted, having the appropriate number of votes
Nodeid Votes Qdevice Name
0 3 Qdevice
Check status of this cluster's QDevice from corosync-qnetd server
The status of this cluster's quorum as seen by the corosync-qnetd server can further demonstrate if all is functioning well. On that corosync-qnetd server, use pcs to show qdevice status.
# pcs qdevice status net
QNetd address: *:5403
TLS: Supported (client certificate required)
Connected clients: 4
Connected clusters: 1
Cluster "rhel7-cluster":
Algorithm: LMS
Tie-breaker: Node with lowest node ID
Node ID 3:
Client address: ::ffff:10.10.182.144:42670
Configured node list: 1, 2, 3, 4
Membership node list: 1, 2, 3, 4
Vote: ACK (ACK)
Node ID 4:
Client address: ::ffff:10.10.181.121:42576
Configured node list: 1, 2, 3, 4
Membership node list: 1, 2, 3, 4
Vote: ACK (ACK)
Node ID 2:
Client address: ::ffff:10.10.182.108:39550
Configured node list: 1, 2, 3, 4
Membership node list: 1, 2, 3, 4
Vote: ACK (ACK)
Node ID 1:
Client address: ::ffff:10.10.181.105:37516
Configured node list: 1, 2, 3, 4
Membership node list: 1, 2, 3, 4
Vote: ACK (ACK)
This example shows all four nodes of our cluster are in contact and are receiving a vote "ack" from the corosync-qnetd server. All is well.
Further recommended tasks
Test functionality of cluster thoroughly with QDevice
To ensure that the cluster functions as expected, run through various failure scenarios that will exercise the QDevice. A few suggestions to start:
- Disrupt the network connectivity between members in various arrangements (e.g.: 3 vs 1 split, 2 vs 2 split, etc) and monitor to see whether some portion of the cluster maintains quorum and continues serving the cluster's resources
- Disrupt the network connectivity between members and also disrupt some of those members connections with
corosync-qnetd. Monitor whether some portion maintains quorum and continues serving the cluster's resources. - Disrupt the availability of the
corosync-qnetdserver and monitor whether any attached clusters maintain quorum and continue serving the cluster's resources.- While the
corosync-qnetdserver is unavailable, also crash or power-off nodes one at a time. See how many nodes the cluster can lose in this state before losing quorum. Be aware of this limit for production situations.
- While the