Support for Live Migration of RHEL High Availability Cluster Nodes on Public Cloud Providers
Updated
Environment
- Red Hat Enterprise Linux Server 7, 8, 9, 10 (with the High Availability Add On)
- Public Cloud Providers (AWS, Azure, Google Cloud, Alibaba Cloud, IBM Cloud)
Issues
- Is the live migration of active cluster nodes supported when running RHEL High Availability on public cloud infrastructure?
- What are the configuration requirements to ensure cluster stability during cloud provider maintenance events or live migration?
Resolution
Red Hat supports the live migration of RHEL High Availability cluster nodes running on public cloud providers, provided that the environment is configured in accordance with both Red Hat’s cluster configuration guidelines and the specific requirements set forth by the cloud provider.
Historically, live migration was often discouraged for High Availability clusters due to the risk of inducing fencing loops or split-brain scenarios caused by temporary pauses in node responsiveness (freezes). However, modern cloud providers have developed specific mechanisms, agents, and configuration standards to handle these events gracefully.
Configuration Requirements
To maintain supportability and ensure cluster integrity during live migration events, the following two conditions must be met:
- Adherence to Red Hat Support Policies: The cluster must be configured according to the general Support Policies for RHEL High Availability Clusters. This includes valid subscription status, supported package versions, and proper fencing configurations.
- Adherence to Cloud Provider Configuration Rules: You must follow the specific architectural and configuration guides provided by your cloud vendor. These guides often mandate specific timeout values, fencing agents (STONITH including
sbdandfence_sbd), or helper resources (such as theazure-eventsagent).
Cloud Provider Documentation & Specifics
Please refer to the following vendor-specific documentation for the required configurations. Failure to implement these provider-specific settings may result in unexpected node fencing during migration events.
- Microsoft Azure
- Azure utilizes specific agents to handle scheduled events and maintenance. Ensure your cluster is configured to monitor and react to these events.
- Reference: Content from learn.microsoft.com is not included.Set up Pacemaker on RHEL in Azure
- Amazon Web Services (AWS)
- AWS requires specific
corosynctoken timeouts and fencing agents to handle the network and instance behavior unique to EC2. - Reference: Content from docs.aws.amazon.com is not included.Cluster Node Setup - SAP HANA on AWS
- AWS requires specific
- Google Cloud Platform (GCP)
- GCP supports live migration for many instance types. Ensure your storage and network heartbeat configurations align with GCP deployment guides.
- Reference: Content from docs.cloud.google.com is not included.Deployment Manager: SAP HANA scale-up high-availability cluster configuration guide
- Alibaba Cloud
- Alibaba Cloud requires specific cross-zone high availability configurations to maintain quorum and connectivity during infrastructure events.
- Reference: Content from www.alibabacloud.com is not included.SAP HANA High Availability Cross-Zone Solution on Alibaba Cloud
- IBM Cloud
- IBM Cloud Power Virtual Server (PowerVS) environments require specific fencing agents (
fence_ibm_powervs) and resource agents (powervs-move-ip) to support HA operations across zones. - Reference: Content from cloud.ibm.com is not included.Implementing a basic cluster in a multizone region
- IBM Cloud Power Virtual Server (PowerVS) environments require specific fencing agents (
Important Considerations
- Fencing is Mandatory: Regardless of the cloud provider, a functioning STONITH (fencing) device is required for a supported RHEL High Availability cluster.
- Timeouts: Cloud environments often require higher
tokentimeout values forcorosyncthan bare-metal on-premise deployments to tolerate the brief pauses associated with migration. Do not lower these values below the vendor-recommended minimums or cluster defaults if no vendor-recommended minimums exist.
Related Articles
- Support Policies for RHEL High Availability Clusters
- How to change totem token timeout value in a RHEL 5, 6, 7, 8 or 9 High Availability cluster?
- Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH
SBR
Product(s)
Category
Components
Article Type