LEAPP upgrade from 16.2 to 17.1 fails during overcloud node provisioning step when adding pre-provisioned nodes

Solution Verified - Updated

Environment

  • Red Hat OpenStack Platform (RHOSP) 16.2
  • Red Hat OpenStack Platform (RHOSP) 17.1

Issue

  • During the 16.2 to 17.1 upgrade process's overcloud adoption and preparation, the step which provisions the overcloud fails when the nodes are pre-provisioned.
    $ openstack overcloud node provision --debug --stack <stack> \
    --output /home/stack/overcloud_adopt/baremetal-deployment.yaml \
    tripleo-<stack>-baremetal-deployment.yaml
  • You will see the following error:
      File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 172, in _reserve_node
    raise exceptions.ReservationFailed(

Resolution

  • If the environment includes pre-provisioned nodes, customize the baremetal-deployment.yaml file. This can become error prone in environments with large numbers of existing nodes.

  • Use the attached tripleo-baremetal-deployment-to-pre-provisioned.py script will help automate the conversions.

  • The script will convert the node definitions within baremetal-deployment.yaml from baremetal managed nodes to pre-provisioned unmanaged nodes.

  • The script does the following:

    1. Set the managed flag to false for all instances in all roles.
    2. Removes the vif flag from all networks instances in all roles.
    3. Removes the vif flag for networks in the role's default section.

Root Cause

  • During 16.2 to 17.1 overcloud adoption/preparation, ironic/metalsmith searches for the nodes to provision. However, pre-provisioned nodes are not defined in Ironic's baremetal-deployment.yaml.

Diagnostic Steps

  • On the main stack, copy the baremetal-deployment.yaml file to the stack user’s home directory and provision the overcloud nodes. Repeat this step on each stack in your environment:
$ cp ~/overcloud-deploy/<stack>/tripleo-<stack>-baremetal-deployment.yaml ~/
$ openstack overcloud node provision --debug --stack <stack> \
--output /home/stack/overcloud_adopt/baremetal-deployment.yaml \
tripleo-<stack>-baremetal-deployment.yaml

The provisioning step will fail with the below traceback:

The full traceback is:
  File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 465, in main
  File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 288, in reserve
  File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 271, in reserve
  File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 108, in reserve_node
    node = self._reserve_node(resource_class, hostname=hostname,
  File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 191, in _reserve_node
    self.connection.baremetal.delete_allocation(allocation)
  File "/usr/lib64/python3.9/contextlib.py", line 126, in __exit__
    next(self.gen)
  File "/usr/lib/python3.9/site-packages/metalsmith/_utils.py", line 146, in reraise_os_exc
    raise exc_info[1]
  File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 172, in _reserve_node
    raise exceptions.ReservationFailed(
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.