LEAPP upgrade from 16.2 to 17.1 fails during overcloud node provisioning step when adding pre-provisioned nodes
Environment
- Red Hat OpenStack Platform (RHOSP) 16.2
- Red Hat OpenStack Platform (RHOSP) 17.1
Issue
- During the 16.2 to 17.1 upgrade process's overcloud adoption and preparation, the step which provisions the overcloud fails when the nodes are pre-provisioned.
$ openstack overcloud node provision --debug --stack <stack> \
--output /home/stack/overcloud_adopt/baremetal-deployment.yaml \
tripleo-<stack>-baremetal-deployment.yaml
- You will see the following error:
File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 172, in _reserve_node
raise exceptions.ReservationFailed(
Resolution
-
If the environment includes pre-provisioned nodes, customize the
baremetal-deployment.yamlfile. This can become error prone in environments with large numbers of existing nodes. -
Use the attached
tripleo-baremetal-deployment-to-pre-provisioned.pyscript will help automate the conversions. -
The script will convert the node definitions within
baremetal-deployment.yamlfrom baremetal managed nodes to pre-provisioned unmanaged nodes. -
The script does the following:
- Set the
managedflag to false for all instances in all roles. - Removes the
vifflag from all networks instances in all roles. - Removes the
vifflag for networks in the role's default section.
- Set the
Root Cause
- During 16.2 to 17.1 overcloud adoption/preparation, ironic/metalsmith searches for the nodes to provision. However, pre-provisioned nodes are not defined in Ironic's baremetal-deployment.yaml.
Diagnostic Steps
- On the main stack, copy the
baremetal-deployment.yamlfile to the stack user’s home directory and provision the overcloud nodes. Repeat this step on each stack in your environment:
$ cp ~/overcloud-deploy/<stack>/tripleo-<stack>-baremetal-deployment.yaml ~/
$ openstack overcloud node provision --debug --stack <stack> \
--output /home/stack/overcloud_adopt/baremetal-deployment.yaml \
tripleo-<stack>-baremetal-deployment.yaml
The provisioning step will fail with the below traceback:
The full traceback is:
File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 465, in main
File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 288, in reserve
File "/tmp/ansible_metalsmith_instances_payload_drsc2_nl/ansible_metalsmith_instances_payload.zip/ansible/modules/metalsmith_instances.py", line 271, in reserve
File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 108, in reserve_node
node = self._reserve_node(resource_class, hostname=hostname,
File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 191, in _reserve_node
self.connection.baremetal.delete_allocation(allocation)
File "/usr/lib64/python3.9/contextlib.py", line 126, in __exit__
next(self.gen)
File "/usr/lib/python3.9/site-packages/metalsmith/_utils.py", line 146, in reraise_os_exc
raise exc_info[1]
File "/usr/lib/python3.9/site-packages/metalsmith/_provisioner.py", line 172, in _reserve_node
raise exceptions.ReservationFailed(
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.