Deployment Examples for RHEL High Availability Clusters - Enabling QDevice Quorum Arbitration in RHEL 7

Updated

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On

Useful References and Guides

Introduction

This article focuses on deploying a QDevice quorum arbitration method for a RHEL 7 High Availability cluster by way of corosync-qdevice and corosync-qnetd.

Red Hat's series of Deployment Examples for RHEL High Availability Clusters aims to deliver simple and direct instruction for configuring common use cases using typical or recommended settings. There may be additional features, options, and decisions relevant to the targeted components that organizations may wish to consider for their deployments, whereas this guide can serve as a helpful reference for understanding how to get up and running with typical settings quickly. The above references and guides may be a useful starting point to understand these components and the High Availability technologies in more detail.

Prepare the corosync-qnetd environment

Prerequisites for deploying corosync-qnetd


Prepare corosync-qnetd server with typical cluster-preparation steps

Although the corosync-qnetd server won't be a cluster member, it will require following many of the same steps to prepare a server to participate in a cluster. Follow these steps:

  • See Preparing Servers for Inclusion in a RHEL 7 Cluster
    • NOTE: Only the pcs package needs to be installed during the software installation steps of that procedure. All others listed are unnecessary for this corosync-qnetd server.
    • NOTE: It is not necessary to open the firewall ports as that procedure describes - we need fewer ports on this corosync-qnetd server, so we will carry out that step below with modified parameters.

Install corosync-qnetd software on quorum arbitration server

On the system that will be the arbitration server (not a member of any cluster membership), install corosync-qnetd:

# yum install corosync-qnetd

Allow access to corosync-qnetd and pcsd through firewall

If using a firewall between the cluster nodes and this system, access to two services will need to be enabled. By default corosync-qnetd listens on port 5403, and pcsd listens on 2224.

NOTE: This step will vary for all environments, and security practices must be decided individually by every organization according to their own needs. An example for a typical RHEL deployment that would use bond0 for connectivity to corosync-qnetd is:

# firewall-cmd --get-active-zones 
public
  interfaces: bond0
# firewall-cmd --add-port=5403/tcp --zone=public --permanent
# firewall-cmd --add-port=2224/tcp --zone=public --permanent
# firewall-cmd --reload

Enable corosync-qnetd on quorum arbitration server

Setup, enable, and start corosync-qnetd using standard settings

# pcs qdevice setup model net --enable --start

Note: This will deploy the corosync-qnetd server with a self-signed certificate for TLS connections, and will set up a certificate-authority to sign certificate requests from cluster nodes that will use this corosync-qnetd service. If needing more control over these security and authentication mechanisms, see corosync-qnetd-cerutil's manpage and help output, and consider running that command manually, or otherwise implementing these components through manual efforts.


Confirm corosync-qnetd service is active

# pcs qdevice status net
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)

Any error or other indication that corosync-qnetd is not operational should be cause to stop and investigate.


Prepare the cluster environment to use corosync-qdevice

Prerequisites for deploying corosync-qdevice


(If deploying a new cluster) - Deploy cluster servers, install operating system, and setup cluster


Install corosync-qdevice software on all nodes

# yum install corosync-qdevice

Authorize cluster node(s) to corosync-qnetd server's pcsd service

From all nodes of the cluster - or alternatively just on the node where pcs quorum commands will be executed in subsequent steps - authorize to the pcsd service on the corosync-qnetd server:

# pcs cluster auth -u hacluster 

Enable QDevice quorum-arbitration in cluster

Configure and activate corosync-qdevice service in cluster

On one node of the cluster, use pcs to configure and activate corosync-qdevice. Information that will need to be in hand for this task:

# # Syntax: # pcs quorum device add model net host=<corosync-qnetd host> algorithm=<algorithm>
# # Example:
# pcs quorum device add model net host=qnetd.example.com algorithm=lms

Check status of QDevice in cluster

Use pcs to review the quorum configuration that was deployed

# pcs quorum config
Options:
Device:
  Model: net
    algorithm: lms
    host: qnetd.example.com

Check the status of quorum with pcs

# pcs quorum status
Quorum information
------------------
Date:             Thu Aug 10 10:49:20 2017
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          1
Ring ID:          1/1507560
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      7
Quorum:           4  
Flags:            Quorate Qdevice 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW rhel7-node1.example.com (local)
         2          1    A,V,NMW rhel7-node2.example.com
         3          1    A,V,NMW rhel7-node3.example.com
         4          1    A,V,NMW rhel7-node4.example.com
         0          3            Qdevice

The important items to look for here are:

  • The flags under Votequorum Information show "Qdevice"
Flags:            Quorate Qdevice 
  • Expected Votes is more than the total number of nodes in the cluster. With the lms algorithm, this should be #nodes + (#nodes - 1). With ffsplit, this should be #nodes + 1. In this example four node cluster using lms, expected votes is now 7 = 4 nodes + 3 for Qdevice
Expected votes:   7
  • The Membership Information shows each node of the cluster having a V character in its Qdevice column - indicating it has a vote. If any node is missing a vote, it may not be in contact with the corosync-qnetd server
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW rhel7-node1.example.com (local)
         2          1    A,V,NMW rhel7-node2.example.com
         3          1    A,V,NMW rhel7-node3.example.com
         4          1    A,V,NMW rhel7-node4.example.com
  • The Membership Information shows the Qdevice listed, having the appropriate number of votes
    Nodeid      Votes    Qdevice Name
         0          3            Qdevice

Check status of this cluster's QDevice from corosync-qnetd server

The status of this cluster's quorum as seen by the corosync-qnetd server can further demonstrate if all is functioning well. On that corosync-qnetd server, use pcs to show qdevice status.

# pcs qdevice status net
QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              4
Connected clusters:             1
Cluster "rhel7-cluster":
    Algorithm:          LMS
    Tie-breaker:        Node with lowest node ID
    Node ID 3:
        Client address:         ::ffff:10.10.182.144:42670
        Configured node list:   1, 2, 3, 4
        Membership node list:   1, 2, 3, 4
        Vote:                   ACK (ACK)
    Node ID 4:
        Client address:         ::ffff:10.10.181.121:42576
        Configured node list:   1, 2, 3, 4
        Membership node list:   1, 2, 3, 4
        Vote:                   ACK (ACK)
    Node ID 2:
        Client address:         ::ffff:10.10.182.108:39550
        Configured node list:   1, 2, 3, 4
        Membership node list:   1, 2, 3, 4
        Vote:                   ACK (ACK)
    Node ID 1:
        Client address:         ::ffff:10.10.181.105:37516
        Configured node list:   1, 2, 3, 4
        Membership node list:   1, 2, 3, 4
        Vote:                   ACK (ACK)

This example shows all four nodes of our cluster are in contact and are receiving a vote "ack" from the corosync-qnetd server. All is well.

Test functionality of cluster thoroughly with QDevice

To ensure that the cluster functions as expected, run through various failure scenarios that will exercise the QDevice. A few suggestions to start:

  • Disrupt the network connectivity between members in various arrangements (e.g.: 3 vs 1 split, 2 vs 2 split, etc) and monitor to see whether some portion of the cluster maintains quorum and continues serving the cluster's resources
  • Disrupt the network connectivity between members and also disrupt some of those members connections with corosync-qnetd. Monitor whether some portion maintains quorum and continues serving the cluster's resources.
  • Disrupt the availability of the corosync-qnetd server and monitor whether any attached clusters maintain quorum and continue serving the cluster's resources.
    • While the corosync-qnetd server is unavailable, also crash or power-off nodes one at a time. See how many nodes the cluster can lose in this state before losing quorum. Be aware of this limit for production situations.
Article Type