Instance migration fails when using cpu-pinning from a numa-cell and flavor-property "hw:cpu_policy=dedicated"

Solution Verified - Updated

Environment

  • Red Hat OpenStack 7.0 or later

Issue

  • We are having migration failures. The procedure was from section 3. "Migrate between hypervisors" (not a live migration).
  • The VM is shutdown before migrate procedure from article https://access.redhat.com/articles/1265613
  • The error message is as follows:
#  nova migrate --poll $u
ERROR (BadRequest): No valid host was found. No valid host found for cold
migrate (HTTP 400) (Request-ID: req-592d59db-9185-4775-b5e2-940aa657a62c)
  • If however one VM is in shut off (on the destination host), then the other VM migration will succeed and the VM will come into service; for example, VM2 migrated from Host10 to Host03 while VM1 was shut off. However VM1 is no longer able to come into service as VM2 is using some of the same dedicated VCPUs.
  • Here is what you get when trying to power on the second VM that has VCPU collision with first one:
2016-02-16 17:30:49.860 58352 INFO nova.compute.resource_tracker
[req-9922b49b-c3e7-491f-b400-fa711b99eee1 - - - - -] Auditing locally available
compute resources for node mme06-host10
2016-02-16 17:30:50.806 58352 ERROR nova.openstack.common.periodic_task
[req-9922b49b-c3e7-491f-b400-fa711b99eee1 - - - - -] Error during
ComputeManager.update_available_resource: Cannot pin/unpin cpus [0, 1, 2, 3, 4,
5, 7, 8, 9, 20, 21, 22, 23, 24, 25, 27, 28, 29] from the following pinned set
[0, 1, 5, 8, 9, 20, 21, 25, 28, 29]
  • It happens whenever the VCPU usage overlaps the usages already present in the possible destinations.

Resolution

Root Cause

  1. hw_cpu_policy=shared: The historical default behaviour for Nova was that VM vCPUs float freely across all pCPUs and VMs contend with
    each other for execution time. Nova had an tunable overcommit ratio that said how much we would oversubscribe pCPUs. eg overcommit of 1.5, means that we'll schedule VMs with a total of 6 vCPUs on a 4 pCPU host. This behaviour is what you get with hw_cpu_policy=shared

  2. hw_cpu_policy=dedicated: The enhancement implemented when you set hw_cpu_policy=dedicated is a strictly pinned, non-overcommit policy. We ignore the overcommit ratio entirely. We will only ever schedule 4 vCPUs on a 4 pCPU host. Each vCPU will be explicitly pinned to one pCPU.

If one adds NUMA into the mix, all that changes is that instead of fitting on the host as a whole, we do fitting against the individual NUMA nodes.

What happens behind the scenes?

When the VM first starts, nova runs its fitting logic (using schedulers) and decides what pCPUs the VM needs to run on. When one migrates to a new host,
nova should be running that fitting logic again, because the pCPUs that are unused on the target host may not be the same as the pCPUs it was originally running on. Nova is currently broken however and never re-runs the fitting logic on migration. There are open bug(s) to solve this mistake, but right now we just recommend - instances using dedicated pCPU policy / NUMA placement should be avoided for migration purpose.

Diagnostic Steps

  • Created a flavor like this:
# nova flavor-show m1.small.performance
+----------------------------+---------------------------------------------------------------------------------+
| Property                   | Value                                                                           |
+----------------------------+---------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                                                           |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                               |
| disk                       | 20                                                                              |
| extra_specs                | {"aggregate_instance_extra_specs:pinned": "true", "hw:cpu_policy": "dedicated"} |
| id                         | 6                                                                               |
| name                       | m1.small.performance                                                            |
| os-flavor-access:is_public | True                                                                            |
| ram                        | 2048                                                                            |
| rxtx_factor                | 1.0                                                                             |
| swap                       |                                                                                 |
| vcpus                      | 2                                                                               |
+----------------------------+---------------------------------------------------------------------------------+
  • Created an instance like :
# nova boot --image cirros1 --flavor m1.small.performance --nic net-id=ce4b1d70-5ffe-40d2-a962-3d0a0e99a883 cpu_pin1
  • On the source host, I could see its cpu association using (before migration):
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 7     instance-00000012              running

# virsh vcpupin instance-00000012
VCPU: CPU Affinity
----------------------------------
   0: 2
   1: 3
  • NUMA topology looks like this (on source-host, before migration);
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 4095 MB
node 0 free: 2367 MB
node 1 cpus: 4 5 6 7
node 1 size: 4096 MB
node 1 free: 709 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

[After LIVE migration]

# ssh <destination-compute-ip> virsh vcpupin instance-00000012
VCPU: CPU Affinity
----------------------------------
   0: 2
   1: 3
  • To be able to hit the issue - try creating a new instance using the same above command and check if it got placed with any other cpus, but it did not:
# ssh <destination-compute-ip> virsh vcpupin instance-00000013
VCPU: CPU Affinity
----------------------------------
   0: 2
   1: 3
  • This is where I hit the same issue as yours.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.