Design Guidance for RHEL High Availability Clusters - Membership Layout and Member System Specifications

Updated 7 Aug 2018

Overview

Applicable Environments

Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On

Useful References and Guides

RHEL 7 High Availability Reference Guide

Introduction

This article provides guidance on establishing a design of a cluster's membership-layout and the system details of individual members that will serve in that cluster. The resulting design should cover:

Number of cluster members
Arrangement of "sites" and the cluster members hosted in each
Whether a quorum-device will be used, and where its server will be located
Machine and system details of cluster members
- Platform - baremetal, virtualization, cloud
- System specifications
- RHEL release
- Types of members - Full quorum member or Pacemaker Remote "worker" node

Designing cluster membership layout

Summary of decisions

Decide how much failover-capability the cluster should provide
Consider if multiple sites are needed to provide coverage for widespread failures and disasters
Potentially include an arbitration server
Consider how the failover-paths of multiple applications could intersect
Put it all together into a membership layout

Decide failover coverage

Consider how extensive the recovery capabilities of this cluster should be and translate those needs into a general picture of the cluster's membership layout:

Comprehensive high availability management with widespread-disaster recovery:
- Multiple sites or locations with independent infrastructure
- At least three quorum members per site provides the best coverage for a wide variety of failure scenarios
- Additional neutral quorum-arbitration server accessible to the entire cluster across all sites
Single-site high availability management with local failure-recovery:
- At least three quorum members
- Additional quorum-arbitration server accessible to all nodes
Simple failover recovery:
- Two members - with one serving as the active primary member, and the other as idle standby for failover

Quorum arbitration with a qdevice

RHEL High Availability clusters often benefit from use of a 'qdevice' - short for "quorum device". A corosync-qdevice component running on cluster members communicates with a separate networked server running corosync-qnetd in order to make intelligent quorum-decisions when members lose contact with each other. Red Hat recommends using this feature whenever possible, as it enhances the reliability of a cluster and simplifies how the membership needs to be laid out. With a qdevice, even-sized memberships or asymmetrical membership don't come with the typical concerns or caveats that might otherwise exist without a quorum device.

To use a qdevice, plan to have one additional machine to host the corosync-qnetd service. Consider the following for that machine:

If using multiple sites, the corosync-qnetd server should be neutrally located so it can serve its purpose even if one of the sites becomes unavailable.
The corosync-qnetd server will not handle any of the cluster's managed-applications. Do not include it in failover-capacity planning.
This corosync-qnetd server does not necessarily need to have the same system specs as other members - it needs to be dedicated to handling just that corosync-qnetd service.
A single corosync-qnetd server can serve multiple separate clusters as long as it is accessible to each. You do not need a separate server for every cluster.

For more information, see:

Coverage across multiple sites

Mission-critical applications that organizations cannot tolerate extended outages of should have redundancy across different facilities or locations with independent infrastructures.

The typical varieties of multi-site cluster deployments are:

Multi-site distributed membership cluster: Applications that must be actively serving from multiple sites simultaneously require a single cluster membership spanning all sites.
- For resiliency, designs should include a qdevice with a neutral quorum-arbitration server accessible from all sites
- If failover across sites is costly, timely, or suboptimal - each site should have two or more nodes (three per site provides better reliability) to allow local-failover before triggering failover to another site.
Coordinating multi-site failover clusters: Applications that only require active/passive failover-coverage between sites can be handled by separate clusters that coordinate ownership and failover of the application and resources.
- Separate clusters - one in each site - coordinate via the booth ticket manager.
- Design should include at least one neutral ticket-arbitration-server accessible over network from the various sites.
- Membership of "failover sites" can vary:
  - Full-coverage failover: Each site should have the same number of members with equal resources as the primary cluster
  - Essentials-only failover: Failover sites can have reduced capacity if only the most critical resources need to failover to an alternate site.
- Failover sites do not need to sit idle - different applications can be distributed across different clusters, and they each serve as a failover-site for the other site(s) (when built with appropriate capacity).
- Coordinating clusters avoid some challenges that distributed-membership clusters face. This design should be preferred if the application is active/passive and shared resources don't need to be used simultaneously across sites.

booth ticket manager for multi-site coordinating failover clusters

If using separate clusters with booth ticket-manager to coordinate failover across sites, those clusters need at least one neutral arbitration server accessible by network from all sites. The booth ticket manager ensures only one cluster is attempting to manage a given resource or resource-group at a time, and the ticket-manager makes this decision intelligently through contact with a neutral server.

Plan for this ticket arbitration server in the cluster design. Consider the following:

If using multiple sites, the booth ticket arbitrator should be neutrally located so it can serve its purpose even if one of the sites becomes unavailable.
The booth arbitrator will not handle any of the cluster's managed-applications. Do not include it in failover-capacity planning.
This booth arbitrator does not necessarily need to have the same system specs as other members - it needs to be dedicated to handling just that booth service.

NOTE: A qdevice can be used in each of the clusters in these multiple-coordinating-cluster designs that use booth. corosync-qdevice and booth would both need a their respective arbitration-servers, but those services can both run on the same neutral system. So, a single arbitration server can meet the requirement for both components.

Managing multiple applications per-cluster

If multiple applications are managed in the same cluster, then will problems occur if they failover to the same member? Cascading system failures do happen - so consider scenarios where multiple faults lead to the cluster operating with the bare-minimum membership.

Would the load of all managed-applications be too much for one cluster member to handle?
- Consider if some applications are less critical and can be configured to stop rather than double-up with other applications in the same location.
- If the must-run applications are too much for a single member, then the cluster should have independent failover paths for each of them to avoid them intersecting on one member.
Or would all applications be able to run in the same location without exhausting resources?
- Three nodes may be enough - a single failover can occur without doubling-up on a member, but another failover could still be handled by the single remaining member if absolutely needed.
- Two nodes would be the bare minimum - but would create a doubled-up situation with any failover that occurs. Having an extra node to avoid that may be better.
The section below on designing individual cluster members discusses scoping the resource utilization of each app. If needed, skip ahead and review those guidelines, then come back and consider how well a member can handle all of the applications of the cluster.

Decision: Plan the membership layout

How many sites/facilities/locations should the cluster span?
- How many members in each site?
- Are all systems in the same cluster, or in separate site-local clusters coordinating with each other to provide failover?
- Is there a ticket arbitration server, and where will it be located?
Will a qdevice be used, and where will the quorum arbitration server be located?

Use this information to build a picture of the the failure-recovery capabilities provided by this cluster for its applications.

Designing individual cluster members

Summary of decisions

What system specs should each member have?
What RHEL release should be deployed on the cluster members?
What hardware/virtualization/cloud platform should individual members run on?
Will the cluster use Pacemaker Remote "worker" nodes?

System specs: Understand the requirements of applications to be managed by the cluster

Consider the applications that are expected to be managed by this cluster. What are the minimum system specs for a single server running the application? What would a high-end estimate be for the resources that could be consumed in handling the expected workload?

Is the workload that these applications will handle well understood?

Determine the processor, memory, disk-space, network, and other system requirements of a server(s) that would run that application with that workload
Add-in enough extra capacity for spikes or unexpected high volume.

Or does the workload need to be better understood before designing cluster members?

Consult the vendor's documentation or advice for the application to establish its requirements.
Look for existing instances of these applications somewhere in the organization that can be reviewed.
Reach out to teams within the organization that are familiar with this application for advice about system requirements.
Consider standing up an instance of the application on a single server and stress test it, and monitor usage.

System specs: Plan enough resources for failover scenarios

Its important to consider the resource utilization that can occur in the context of a high availability cluster.

Handling multiple managed-applications on one cluster-member

If multiple applications or resources will be managed by the cluster, one cluster member might need to handle all of those applications at the same time.
The cluster can be configured to stop lower-priority applications rather than run them on the same server as higher-priority apps.
Add up the capacity needed by all of the "must-run" applications. If a cluster might need to keep operating down to a "last man standing", estimate what it would take to operate all of the cluster's managed-applications together.

Failover and recovery actions may cause load spikes

Many applications consume heavy resources when they are starting or stopping.
Some applications or operations could consume more resources when there is contention for system resources with other users/applications.
Failures can cascade throughout an environment, leading to multiple applications needing recovery at the same time.

Recommendation: The minimum system requirements of each node should be designed to handle all essential applications and resources that the cluster manages.

System specs: Consider the additional system resources needed by the RHEL High Availability software

The RHEL and RHEL High Availability software that runs on cluster members need system resources to perform their functions, outside of the resources devoted directly to cluster-managed applications. It is important to make sure these needs are accounted for when designing and choosing the systems that will run in this cluster.

RHEL technology capabilities and limits
TODO: Link to system requirements for RHEL High Availability
TODO: Link to system requirements for RHEL High Availability pacemaker-remote nodes

Recommendation: Build in plenty of extra capacity per-member for cluster and operating system processes, scaling further beyond the applications' needs as more managed applications are included in the cluster's design.

Platforms, Member Type: Consider Pacemaker Remote nodes

A RHEL High Availability cluster can include Pacemaker Remote nodes that are not full-members for quorum and decision-making purposes, but can add capacity to the cluster to handle application instances and resources. These Pacemaker Remote nodes cannot serve the functions of the cluster all by themselves if all full-members were to become unavailable, so a cluster's design must start with at least a core set of quorum members that provide the coverage for system-failure scenarios. After an adequate number of full members are included to cover those scenarios, it may be more beneficial to add further capacity through Pacemaker Remote nodes.

These systems running the pacemaker_remote service instead of the full-cluster-stack can host managed-resources of the cluster as a full member would. They can be an appealing and beneficial addition in several ways:

Lower resource utilization, as they don't run the typical High Availability components such as pacemaker and corosync
Capacity of a cluster can grow beyond the RHEL High Availability maximum-member limit - as Pacemaker Remote nodes don't count towards that limit
The size and capacity of a cluster can be scaled by adding or removing Pacemaker Remote nodes, without affecting quorum decisions because the core membership would remain the same.
Pacemaker Remote nodes do not need to run from the same platform as full members. Virtual machines serving as Pacemaker Remote nodes is common; these VMs might be hosted by libvirt/KVM running on the full members and managed by the cluster, or they might be separately managed by the virtualization platform.

So, it is worth considering if any of the members that were planned in the design may be candidates to be Pacemaker Remote nodes instead of full members. Keep in mind:

The core cluster-membership should be designed to adequately handle failure scenarios with at least one full member operational. A cluster will not serve its functions if only Pacemaker Remote nodes are available.
Pacemaker Remote nodes can only provide service if they are reachable by a full-member with quorum via a network connection.

Platforms: Planning for changing demands

Workloads change. What system-platform for members can provide the needed capacity for applications, even if demand rises unexpectedly?

Can the application handle growth by scaling-out?
- If more instances of the application can be added to meet growing demand, then the cluster-members' platform should be something that your organization can deploy quickly and easily - whether that be baremetal, virtual machines, or cloud instances.
- Pacemaker Remote nodes allow for dynamic scaling of cluster capacity to meet demand.
  - Virtualization platforms can be useful for quick-deployment of these remote nodes. If a dedicated virtualization environment is not already available, baremetal cluster-nodes (the full members) can serve as libvirt/KVM hypervisors hosting guest-remote machines.
  - Baremetal servers can serve as remote nodes too, as long as they can be deployed quickly enough to meet the growing demand in time.
Or does the application's server need to scale-up to handle growth?
- If a single machine's resources need to be enough on their own to handle everything a managed-application will do, then the cluster members' system-design must be considered carefully.
- Virtual machines or cloud instances may be more flexible and allow scaling up resources to handle changes.
- If targeting a physical / baremetal platform, consider those with easily-adjustable resource allocations
- If targeting a baremetal platform with system specs that are static, then consider going well beyond expected capacity so it is not maxed-out if usage grows.

Platforms: Consider STONITH needs

STONITH/fencing design-considerations can be a deep topic that is best handled outside this guide. But available STONITH methods must be considered when choosing a platform, so it is important to at least make sure there are options for your platform before making any firm decisions on the membership design.

Does the targeted platform have a watchdog timer device (does it have a /dev/watchdog)? If so, sbd may be a candidate.
- Support policies - sbd and fence_sbd
- Administrative procedures - Validating a Watchdog Timer Device (WDT) to use with sbd
If targeting a virtualization platform and Red Hat's policies list it as a supported platform for RHEL High Availability, there is at least one STONITH method available for it.
If targeting a baremetal platform:
- Does it have a BMC or system-management card that supports IPMI-based administration? fence_ipmilan may be a candidate.
- Does it have another system-administration interface that Red Hat provides a fence-agent for? Check Red Hat's Explore guides, check the command yum list fence-agents-* from a RHEL system, or search for fence-agents in the This content is not included.RHEL package downloads to look for agents that might match your platform.
- Does it have an APC or WTI PDU supplying its power? fence_apc or fence_wti may work for this system.

There are more STONITH methods available. Consult RHEL product documentation or Red Hat support for additional guidance.

Platform: Consider system certifications / support requirements

Virtualization platforms: Consult the platform's documentation for any requirements or policies with Red Hat Enterprise Linux.
Baremetal platforms: Review the This content is not included.RHEL system certification catalog to make sure your system is certified, for optimal behavior and Red Hat Support experience.
Cloud platforms: Review Red Hat Certified Cloud and Service Providers for information on that cloud platform.
All platforms: Review Support policies for RHEL High Availability clusters and check for any requirements or policies specific to the platform you are considering.

RHEL Release: Which release to deploy on cluster members?

Recommendation: Red Hat recommends utilizing the latest release of RHEL for maximum stability, optimal performance, and availability of features.

The release of RHEL that will be used may impact which features are available in the cluster and to applications that will run there, so release requirements should be understood early.

Are there specific features being targeted for this cluster, and do those features have any minimum RHEL release requirements?
- Review Red Hat's support policies for RHEL High Availability clusters to investigate RHEL-release conditions.
- Review This content is not included.Red Hat's RHEL documentation for general operating system conditions and requirements
Do the applications' RHEL release requirements conflict with any of the requirements dictated by targeted cluster features? If so:
- Consider if the application has alternative configurations or releases that can be used with the needed minimum release.
- Contact the application vendor and/or Red Hat for guidance in choosing a release or developing a solution to the conflict
All members of the cluster should be designed to use the same RHEL release

Article Type

General

Design Guidance for RHEL High Availability Clusters - Membership Layout and Member System Specifications

Contents

Overview

Applicable Environments

Recommended Prior Reading

Useful References and Guides

Introduction

Designing cluster membership layout

Summary of decisions

Decide failover coverage

Quorum arbitration with a qdevice

Coverage across multiple sites

booth ticket manager for multi-site coordinating failover clusters

Managing multiple applications per-cluster

Decision: Plan the membership layout

Designing individual cluster members

Summary of decisions

System specs: Understand the requirements of applications to be managed by the cluster

System specs: Plan enough resources for failover scenarios

System specs: Consider the additional system resources needed by the RHEL High Availability software

Platforms, Member Type: Consider Pacemaker Remote nodes

Platforms: Planning for changing demands

Platforms: Consider STONITH needs

Platform: Consider system certifications / support requirements

RHEL Release: Which release to deploy on cluster members?