Understanding live migration "migration_max_bandwidth" and "max_outgoing_migrations" parameters in vdsm.conf

Solution Verified - Updated

Environment

  • Red Hat Enterprise Virtualization 3.0 --> 3.5

Issue

  • Is there any theory on how to configure the migration_max_bandwidth and max_outgoing_migrations parameters in /etc/vdsm/vdsm.conf?

  • We are using RHEV environment and now considering to optimize performance of migrations. We found the following parameters in /usr/share/doc/vdsm*/vdsm.conf.sample. How do those parameters work ? What should I be careful with when tuning these parameters ?

# migration_max_bandwidth = <X>

<..snip..>

# max_outgoing_migrations = <Z>

Note: the file vdsm.conf.sample is available only on RHEL base hypervisor. Not on RHEV-H environment. Also the values and comments depends on versions of vdsm package.

Resolution

There are some rules surrounding the configuration of migration_max_bandwidth and max_outgoing_migrations in /etc/vdsm/vdsm.conf, which should be taken into account before changing the values to anything other than the default. Without the knowledge of these rules, network topology and available bandwidth, any changes to the configuration value can have negative consequences for your live migrations.

If you are using Gigabit LAN interfaces for the management LAN for RHEV, under most circumstances the default values for migration_max_bandwidth or max_outgoing_migrations in /etc/vdsm/vsdm.conf should not be changed without good reason (it is possible they may need to be lowered, read on for why, but with Gigabit ethernet it is unlikely you would ever need to increase them).

To determine a value for your environment you need to be aware of several things before making a decision about how to configure this value.

  1. This article is written referring to Management Logical Network. But starting with RHEV 3.3, a separate Migration Logical Network can be configured. And in most of the cases it is prudent to allocate a distinct migration network.

  2. You need to be aware of the bandwidth available via the management network interface that live VMs will be migrated through. In general the bandwidth available may vary depending on network topology and what is connected to the network - that is full wire speed may not be available to the host the VM is being transferred to. With bonded interfaces you may still only have the bandwidth available through one interface depending on the bonding mode used.

  3. You should be aware of the units for migration_max_bandwidth. The units in vsdm.conf are in MiB/s (Megabytes per second) per VM, but we need to convert this to network speed when looking at network bandwidth which is always given in Mb/s (Megabits per second) before we can decide how to configure migration_max_bandwidth.

  4. You should be aware of how live migration is impacted by setting max_outgoing_migrations in vdsm.conf.

The general rule of thumb for configuring migration_max_bandwidth is as follows :

(Available Network speed in mbps) > "migration_max_bandwidth" * 8 * "max_outgoing_migrations"

For example a Gigabit ethernet network with migration_max_bandwidth at the default of 32 and max_outgoing_migrations at the default of 3 we have:

1000 > 768 (768 is 32 * 8 * 3)

In this case as long as the management interface can achieve 768 Mb/s the outgoing live VMs should be transferred successfully.

Looking at another example if we change migration_max_bandwidth to 100 we have:

1000 > 2400 (2400 is 100 * 8 * 3)

That configuration won't be successful and is likely to result in the host not being able to communicate with the manager and become non-responsive (which might result in the hypervisor being fenced) since the management network will be saturated and management communications will be almost impossible to keep going between the hypervisor and other parts of RHEV. In this case you would need a faster network to keep that setting, for example a 10G ethernet LAN interface.

5. There is a delicate balance between the two parameters:

  • A higher max_outgoing_migrations benefits a system with many small VMs, or relatively dormant VMs where not too much memory is expected to be transfered.
  • A high migration_max_bandwidth is more appropriate when you have busy or big VMs, where you'd prefer to serialize their migration.

6. It is also important to understand the throughput available via a fast network interface. Depending on network topology you may not be able to achieve the required throughput and the networking interface and the networking stack may require tuning to achieve the desired result.

Some examples of issues to be considered are :

  • A hypervisor with a 10G ethernet interface migrating live VMs to a system with a Gigabit ethernet interface - your configuration should look at the lowest common denominator, not the highest. Alternatively all management interfaces in this situation should be either Gigabit or 10G ethernet interfaces. If all 10G ethernet interfaces are used you might consider increasing migration_max_bandwidth or max_outgoing_migrations to use the higher bandwidth available.

  • If other hypervisors are live migrating VMs to the target system you may exceed the available bandwidth at the target machine leading to issues for that hypervisor. For example if two hypervisors are each transferring 3 VMs to a third hypervisor you may see issues related to the third system caused by excessive network traffic via the management interface on the third hypervisor. In such circumstances you could need to lower either migration_max_bandwidth or max_outgoing_migrations in vsdm.conf or increase the available bandwidth with 10G LAN interfaces and network switches or use Gigabit ethernet interfaces bonded in a bonding mode that allows a higher aggregate interface speed (e.g. mode 4).

7. Before implementing changes to migration_max_bandwidth (remember to convert from Mb/s back to MiB/s if you change it) or max_outgoing_migrations in vsdm.conf you should test multiple live migrations to ensure that they are successful with the bandwidth available to you. You should also retest after making changes to either value to ensure that the transfers will still be successful after the changes.

In general there is a correlation between the network speed for both the source and destination systems and the values specified for migration_max_bandwidth or max_outgoing_migrations. Increasing these values without understanding the implications can lead to issues with the live VM migration and hypervisors being marked as failed (Non-Responsive in RHEV-M).

If you need more information or need assistance from Red Hat, please feel free to contact Red Hat consulting services.

You may also be interested in this article for exact instructions on editing those parameters:
What is the unit of "migration_max_bandwidth" in vdsm.conf ?

Please note: The migration parameters mentioned above should specifically be added to the [vars] section of file /etc/vdsm/vdsm.conf file. If, for example, they are appended to the end of the file, they will most likely be included in a different section and as a result will have no effect.

Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.