Design Guidance for RHEL High Availability Clusters - Microsoft Azure Virtual Machines as Cluster Members

Updated

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) with the High Availability Add-On
  • Using the Microsoft Azure platform for virtual-machine hosting

Useful References and Guides

Introduction

This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on Microsoft Azure virtual machines.

Virtual machine configuration

VM distribution recommendation: Place VMs in the same Azure Availability Set

Additional details:

Reasons:

  • RHEL High Availability requires enough members to be active and functional for the cluster to maintain availability of services. An Availability Set aims to keep VMs isolated so that a single failure is less likely to disrupt all VMs in the cluster.
  • Quoting the above linked Microsoft tutorial: "An Availability Set is a logical grouping capability that you can use in Azure to ensure that the VM resources you place within it are isolated from each other when they are deployed within an Azure datacenter. Azure ensures that the VMs you place within an Availability Set run across multiple physical servers, compute racks, storage units, and network switches. If a hardware or Azure software failure occurs, only a subset of your VMs are impacted, and your overall application stays up and continues to be available to your customers"

Data storage considerations: Picking a storage method

Considerations:

  • What is special about storage for High Availability use cases?: A cluster-managed-application's data will typically need to be stored in a way that can be accessed from any of the systems serving in a RHEL High Availability cluster. The application may have to failover to other nodes, and so each one would need to have access or be able to activate the storage target when the application needs to be started there.
  • Azure-specific data storage: See Microsoft's guidance on different data storage methods offered by Azure: Content from docs.microsoft.com is not included.docs.microsoft.com - Deciding when to use Azure Blobs, Azure Files, or Azure Disks
    • Azure Blobs may be an effective storage method for application data if the application supports Blobs and/or can use Microsoft's libraries for them.
    • Azure Files may be a useful method for data storage in a RHEL High Availability cluster, as the data-share mount should be manageable as a highly available filesystem resource by the cluster, and applications may not need special support to access the file system contents (as long as they are compatible with the smb protocol).
      • See the section below in RHEL High Availability cluster configuration for further details.
    • Azure Disks are exclusive to a single VM at a time, and RHEL High Availability does not yet offer any mechanism to transfer an Azure Disk from one VM to another - so this data storage method is typically not viable for RHEL High Availability cluster-managed applications.
  • Cloud-based distributed/replicated storage: Red Hat Gluster Storage, Red Hat Ceph Storage, host-based storage replication like Linbit DRBD, or other storage solutions targeting cloud deployments may be able to serve cluster-managed applications.
  • NFS or CIFS (samba): These can be used in various ways to present data to the cluster-managed applications
    • Accessing your on-premise NFS/CIFS exports may be possible, but may also incur significant performance penalties. If only small amounts of data are needed, and/or latency is not a concern, then using a data-share from a filer or cluster in your on-premise datacenter may be a suitable solution.
    • A RHEL High Availability cluster of VMs can be deployed in Azure to export data to your application cluster, making these exports highly-available from that backend cluster using some sort of host-based replication mechanism. Linbit's DRBD, or other replication software may provide a solution for making data highly available to the export-cluster.
    • A VM can be deployed in Azure using an Azure Disk as backing store for exporting data via NFS or CIFS to your cluster. However the lack of redundancy of this VM (or the data it contains) could make this a single point of a failure for the RHEL HA cluster it serves - so this is often not a compelling solution for critical workloads.

NOTE: Red Hat has not fully scoped or evaluated these solutions in conjunction with RHEL High Availability. Please ensure whichever solution you choose has been thoroughly tested in your environment and determined to be stable and suitable.


RHEL High Availability cluster configuration

Detailed instructions: Deploying a RHEL 7 High Availability cluster on Azure

For a complete procedure to deploy a RHEL 7 cluster on Azure, see: Administrative Procedures - Installing and configuring a RHEL 7.5 High Availability cluster on Microsoft Azure


Cluster configuration requirement: Use udpu transport protocol

Additional details:

Reasons:

  • Azure VNETs do not support multicast traffic - which would be required to use the udp transport.

STONITH recommendation: Use fence_azure_arm as primary method

Additional details:

Reasons:

  • fence_azure_arm is the only fence-agent designed to work with Azure capabilities.

STONITH considerations: Other methods

Considerations:

  • No other STONITH agents or methods provided by Red Hat are known to be compatible with Azure, other than fence_azure_arm. To mention a few specific methods that are often asked about:
    • sbd: No suitable watchdog device is available on Azure VMs, so sbd is not compatible with this platform. See Red Hat's support policies for sbd and fence_sbd for more detail.
    • fence_scsi / fence_mpath: These methods are only suitable for clusters that host all cluster-managed application-data on SCSI-3-compatible shared block devices. No such shared devices are available in Azure environments, so these agents are incompatible with this platform. See Red Hat's support policies for fence_scsi and fence_mpath for more detail.

Considerations: Routing traffic to cluster-managed applications with Azure Load Balancer

Considerations:

  • Why use a load balancer?: An Azure Load Balancer will typically need to be configured for most RHEL High Availability deployments, either because:
    • RHEL High Availability can manage virtual floating IPs with its IPAddr2 resource agent, but Azure will only route traffic to such a virtual IP if an Azure Load Balancer is configured with that IP in its backend pool. Or
    • RHEL High Availability is hosting a load-balanced application on multiple cluster nodes that will have work distributed across them by a frontend load balancer.
  • Do I always need a load balancer?: If the cluster's managed applications don't require a floating IP address and doesn't need traffic distributed across the cluster members in parallel, then a Load Balancer may not be required - such as if the cluster's functions are served entirely over the VMs' dynamic internal IPs (DIPs) or public instance-level IPs (PIPs). This is not a common High Availability configuration.
  • Load balancer health probes: Azure Load Balancer requires VMs in the backend pool to respond to periodic health probes on a configurable port in order for the Load Balancer to route traffic to the target backend address.
    • If the application listens on TCP or services HTTP requests, the Load Balancer can be configured to use a TCP custom probe or HTTP custom probe targeted at that application's network port. As long as the application is listening and responsive, the Load Balancer should forward traffic to that node.
    • If the application does not use TCP or HTTP, or for some reason it is preferred that its port is not probed by the Load Balancer - RHEL HA provides the azure-lb resource type to listen for TCP connections on a defined port. This listener can serve as the target for health probes on an arbitrary port ensuring that any node where the resource is started will have traffic forwarded to it.
  • More information on Azure Load Balancer:

High Availability resource configuration guidance: Azure Load Balancer and a Floating IP

Scenario: The cluster manages a single-instance application that can move throughout the cluster as needed, and the application communicates with clients over a floating IP address that moves around the cluster along with it. Azure Load Balancer receives incoming client traffic for that floating IP and forwards it to whichever cluster member is hosting the application and IP at that time.

Configuration guidance:

  • A typical HA configuration with a floating IP address would include:
    • Azure configuration: An Azure Load Balancer backend pool associated with the cluster's Availability Set
    • Azure configuration: An Azure Load Balancer health probe that points to the application's port (TCP or HTTP protocol), or points to an arbitrary unused port that azure-lb will listen on (azure-lb uses TCP, doesn't matter what the application supports)
    • Azure configuration: An Azure Load Balancer rule capturing: the application's port(s) & protocol(s), the backend pool with the cluster's Availability Set, and the created health probe. This rule should typically have the Floating IP (direct server return) option set to Enabled for floating IP configurations.
    • RHEL HA configuration: An IPaddr2 resource that manages the floating IP address in the cluster. Set ip and cidr_netmask to an appropriate IP matching the network configuration in the Load Balancer.
    • RHEL HA configuration (optional): If the Load Balancer health probe is not pointing at the application's port but rather an arbitrary port, then create an azure-lb resource with port configured to the port that was set in the health probe.
    • RHEL HA configuration (optional): If using an azure-lb resource, then ensure that it runs together with the IPaddr2 resource and the application's resource(s) by adding them to the same resource group or using order and colocation constraints.
  • See Red Hat's Deployment Guide for RHEL HA on Azure for instructions on configuring the Azure Load Balancer and the azure-lb resource with a Floating IP use-case.
    • Alternatively, configure the load balancer to point at the application's port, and skip the azure-lb configuration.

High Availability resource configuration guidance: Azure Load Balancer and a multi-instance load-balanced application

Scenario: The cluster manages an application that has multiple instances running throughout the cluster - each one listening for client requests on that cluster member's dynamic or static IP as opposed to a floating IP. Client traffic comes in to the Azure Load Balancer, where it is then forwarded on to one of the available cluster members running the application. Client traffic should not be forwarded to cluster members that are down or not serving the application at that time.

Additional detail:

  • A typical HA configuration with a load balanced application would include:
    • Azure configuration: An Azure Load Balancer backend pool associated with the cluster's Availability Set
    • Azure configuration: An Azure Load Balancer health probe that points to the application's port (TCP or HTTP protocol), or points to an arbitrary unused port that azure-lb will listen on (azure-lb uses TCP, doesn't matter what the application supports)
    • Azure configuration: An Azure Load Balancer rule capturing: the application's port(s) & protocol(s), the backend pool with the cluster's Availability Set, and the created health probe. This rule should typically have the Floating IP (direct server return) option set to Disabled for load balanced applications.
    • RHEL HA configuration (optional): If the Load Balancer health probe is not pointing at the application's port but rather an arbitrary port, then create an azure-lb resource with port configured to the port that was set in the health probe, and make this resource a clone (pcs resource create [...] --clone).
    • RHEL HA configuration (optional): If using an azure-lb resource, then ensure that it runs together with the the application's resource(s) by using colocation and order constraints. NOTE: clones can't be added to a group, but groups can be cloned, so if you prefer groups over contraints, then add the application resources and the non-cloned azure-lb resource to a group, then clone the group.

High Availability resource configuration: Managing Azure Files with a cluster resource

Considerations:

  • Red Hat has not yet fully validated that its ocf:heartbeat:Filesystem agent can properly manage a mount of an Azure Files share and maintain resilience of it across nodes in conjunction with applications that use that share for data storage.
  • If an Azure Files share is available for your HA-managed application and you wish to manage the share mount in the cluster alongside your application, it is recommended you test thoroughly through a variety of possible failure scenarios and production conditions.
  • Contact Red Hat Support for assistance and guidance if you wish to manage an Azure Files share in a cluster.

Possible resource configuration:

If you wish to have the cluster resource manager mount an Azure Files share, a possible (but unverified) configuration to make sure one node always has that share mounted could be:

# pcs resource create myapp-files Filesystem fstype=cifs device=//myaccount.file.core.windows.net/myshare directory=/myapp/myshare options='username=myuser,password=password,domain=example.com'

Or to make it active on multiple nodes using a clone resource:

# pcs resource create myapp-files Filesystem fstype=cifs device=//myaccount.file.core.windows.net/myshare directory=/myapp/myshare options='username=myuser,password=password,domain=example.com' --clone 

If a cluster-managed application relies on this share's data to function, then a dependency should be created - either by placing this Filesystem resource in the app's resource-group, or by creating ordering and colocation constraints. See the RHEL High Availability Add-On Reference Guide for instructions in those areas. For example:

# # Syntax: pcs resource group add <group ID> <resource ID> [--before <resource ID>]
# # Example:
# pcs resource group add myapp-group myapp-files --before myApp-script

Article Type