Is live migration supported for VMs that are members of a RHEL cluster?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 5 and newer (with the High Availability Add-on)
  • Virtual machines as cluster members
  • Any virtualization technology like VMware, RHV, KVM or all supported cloud platforms

Issue

  • I have a cluster of VMs. Can I live migrate them between hosts while they are members in the cluster?
  • Is VMware VMotion or DRS supported for VMs running in a RHEL cluster?
  • My cluster nodes are getting fenced during VMware DRS (Distributed Resources Scheduler) migration.
  • We have setup a two node cluster on virtual guests on a VMware environment. When one of them is migrated to another VMware physical host, and this may take up to 30 seconds to complete, after the migration we see that it is fenced by the other node.

Resolution

Red Hat does not test live migration of cluster nodes running on any virtualization technology, and thus live migration is not officially supported. This does not mean that vMotion or other similar technologies will not work, but rather that Red Hat cannot guarantee issues will not arise.

Red Hat recommends that you disable any automated relocation features that is used by the VMs ( ex: VMware DRS for the cluster nodes). The cluster nodes can be migrated after the cluster is gracefully stopped on them or the guest OS is shut down.


While Red Hat does not support live migrations of VMs that are cluster nodes, we recommend the following if you must perform live migrations:

  • Configure a token timeout of at least 15000.
  • Use a 10G network connections on the cluster nodes.
  • If the cluster nodes are VMware VMs: Configure VMware DRS anti-affinity rules to prevent more than one VM from running on a physical server at any given time.

This is not a workaround, these are some recommendations in case you want to go anyway with the live migration but that is at your own risk.

Root Cause

Temporary loss of responsiveness can occur when virtual guests are migrated. Depending on the environment, the period of unresponsiveness may exceed the cluster heartbeat timeout, triggering cluster membership transition and fencing.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.