Support for Live Migration of RHEL High Availability Cluster Nodes on Public Cloud Providers

Updated

Environment

  • Red Hat Enterprise Linux Server 7, 8, 9, 10 (with the High Availability Add On)
  • Public Cloud Providers (AWS, Azure, Google Cloud, Alibaba Cloud, IBM Cloud)

Issues

  • Is the live migration of active cluster nodes supported when running RHEL High Availability on public cloud infrastructure?
  • What are the configuration requirements to ensure cluster stability during cloud provider maintenance events or live migration?

Resolution

Red Hat supports the live migration of RHEL High Availability cluster nodes running on public cloud providers, provided that the environment is configured in accordance with both Red Hat’s cluster configuration guidelines and the specific requirements set forth by the cloud provider.

Historically, live migration was often discouraged for High Availability clusters due to the risk of inducing fencing loops or split-brain scenarios caused by temporary pauses in node responsiveness (freezes). However, modern cloud providers have developed specific mechanisms, agents, and configuration standards to handle these events gracefully.

Configuration Requirements


To maintain supportability and ensure cluster integrity during live migration events, the following two conditions must be met:
  • Adherence to Red Hat Support Policies: The cluster must be configured according to the general Support Policies for RHEL High Availability Clusters. This includes valid subscription status, supported package versions, and proper fencing configurations.
  • Adherence to Cloud Provider Configuration Rules: You must follow the specific architectural and configuration guides provided by your cloud vendor. These guides often mandate specific timeout values, fencing agents (STONITH including sbd and fence_sbd), or helper resources (such as the azure-events agent).

Cloud Provider Documentation & Specifics


Please refer to the following vendor-specific documentation for the required configurations. Failure to implement these provider-specific settings may result in unexpected node fencing during migration events.

Important Considerations

  • Fencing is Mandatory: Regardless of the cloud provider, a functioning STONITH (fencing) device is required for a supported RHEL High Availability cluster.
  • Timeouts: Cloud environments often require higher token timeout values for corosync than bare-metal on-premise deployments to tolerate the brief pauses associated with migration. Do not lower these values below the vendor-recommended minimums or cluster defaults if no vendor-recommended minimums exist.

Related Articles

SBR
Category
Components
Article Type