How to split roles during upgrade from RHOSP 16.2 to RHOSP 17.1
Environment
- Red Hat OpenStack Platform 16.2 being upgraded to Red Hat OpenStack Platform 17.1
Issue
- Upgrade operations over roles with high amount of nodes are slow. How can we speed them up?
- FFU performance improvement for RHOSP deployment with many nodes assigned to each role.
- How to split existing roles for overcloud nodes during FFU?
Resolution
Please find a step-by-step guidelines for a scenario where SpecialCompute role originally has 50 nodes. Two new roles (SpecialComputeA and SpecialComputeB) will be introduced and 25 existing nodes will moved to each new role.
-
Follow the Framework for upgrades (16.2 to 17.1) procedure until step 5.1.12 (it starts with
Run the upgrade preparation script for each stack in your environment). Please interrupt normal steps just before you run theovercloud_upgrade_prepare.shscript. -
Create the relevant roles for your environment by tuning the
role_data.yamlfile and copy the sourceSpecialComputerole definition to the new role. Repeat this step for each additional role required. New roles can have 0 nodes until you are ready to move yourSpecialComputenodes to the new roles. Please note that at this step a RHEL 8 roles are created:name: SpecialComputeA description: | Basic Compute Node role CountDefault: 1 rhsm_enforce_multios: 8.4 ... ServicesDefault: ... - OS::TripleO::Services::NovaLibvirtLegacy -
Copy the
overcloud_upgrade_prepare.shfile to thecopy_role_SpecialComputeA_param.shandcopy_role_SpecialComputeB_param.shfiles:$ cp overcloud_upgrade_prepare.sh copy_role_SpecialComputeA_param.sh $ cp overcloud_upgrade_prepare.sh copy_role_SpecialComputeB_param.sh -
Edit both
copy_role_<Compute>_param.shfiles to include thecopy_role_params.pyscript. This script generates the environment file that contains the additional parameters and resources for the new role. Use the-ooption to define the name of the output file that includes all the non-default values of the source Compute role for the new role. For example forSpecialComputeA:/usr/share/openstack-tripleo-heat-templates/tools/copy_role_params.py --rolename-src SpecialCompute --rolename-dst SpecialComputeA \ -o SpecialComputeA_params.yaml \ -e /home/stack/templates/internal.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \ -e /home/stack/templates/network/network-environment.yaml \ -e /home/stack/templates/inject-trust-anchor.yaml \ -e /home/stack/templates/hostnames.yml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /home/stack/templates/nodes_data.yaml \ -e /home/stack/templates/debug.yaml \ -e /home/stack/templates/firstboot.yaml \ -e /home/stack/overcloud-params.yaml \ -e /home/stack/overcloud-deploy/overcloud/overcloud-network-environment.yaml \ -e /home/stack/overcloud_adopt/baremetal-deployment.yaml \ -e /home/stack/overcloud_adopt/generated-networks-deployed.yaml \ -e /home/stack/overcloud_adopt/generated-vip-deployed.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/nova-hw-machine-type-upgrade.yaml \ -e ~/containers-prepare-parameter.yaml -
Run both
copy_role_SpecialComputeA_param.shandcopy_role_SpecialComputeB_param.shscripts:$ sh /home/stack/copy_role_SpecialComputeA_param.sh -
Move the Compute nodes from the source role to the new role. Run this script listing source role
SpecialComputeand destination roleSpecialComputeAand list 25 computes you want to move from source to destination. Than run the same script for sourceSpecialComputeand destinationSpecialComputeBwith list of remaining computes you want to move.python3 /usr/share/openstack-tripleo-heat-templates/tools/baremetal_transition.py --baremetal-deployment /home/stack/tripleo-<stack>-baremetal-deployment.yaml --src-role SpecialCompute --dst-role SpecialComputeA compute-0 compute-1 compute-2 compute-3
Note:
This tool includes the original /home/stack/tripleo-<stack>-baremetal-deployment.yaml file that you exported during the undercloud upgrade. The tool copies and renames the source role definition in the /home/stack/tripleo-<stack>-baremetal-deployment.yaml file. Then, it changes the hostname_format to prevent a conflict with the newly created destination role. The tool then moves the node from the source role to the destination role and changes the count values.
Replace <stack> with the name of your stack.
Replace <Compute_source_role> with the name of the source Compute role that contains the nodes that you are moving to your new role.
Replace <Compute_destination_role> with the name of your new role.
Replace <Compute-0> <Compute-1> <Compute-2> with the names of the nodes that you are moving to your new role.
-
Reprovision the nodes to update the environment files in the stack with the new role location:
$ openstack overcloud node provision --stack <stack> --output /home/stack/overcloud_adopt/baremetal-deployment.yaml /home/stack/tripleo-<stack>-baremetal-deployment.yamlNote:
The output baremetal-deployment.yaml file is the same file that is used in the overcloud_upgrade_prepare.sh file during overcloud adoption. -
Ensure new compute roles get EL8 containers as we are not updating OS but just OpenStack at the moment. Rerun step 5.1.8 (starts with
Complete the following steps to prepare the containers) with updatedCOMPUTE_ROLESvariable.COMPUTE_ROLES="--role SpecialCompute --role SpecialComputeA --role SpecialComputeB" python3 /usr/share/openstack-tripleo-heat-templates/tools/multi-rhel-container-image-prepare.py \ ${COMPUTE_ROLES} \ ${CONTROL_PLANE_ROLES} \ --enable-multi-rhel \ --excludes collectd \ --excludes nova-libvirt \ --minor-override "{${EL8_TAGS}${EL8_NAMESPACE}${CEPH_OVERRIDE}${NEUTRON_DRIVER}\"no_tag\":\"not_used\"}" \ --major-override "{${EL9_TAGS}${NAMESPACE}${CEPH_OVERRIDE}${NEUTRON_DRIVER}\"no_tag\":\"not_used\"}" \ --output-env-file \ /home/stack/containers-prepare-parameter.yaml -
Include the environment file that contains the parameters needed for the new roles (
-e) in yourovercoud_upgrade_prepare.sh. For example:... -e /home/stack/SpecialComputeA_params.yaml \ -e /home/stack/SpecialComputeA_params.yaml \ ...
Now you can continue with step 5.1.12 by running overcloud_upgrade_prepare.sh. Procedure is pretty much same as for 11.3. Upgrading Compute nodes to a Multi-RHEL environment but we have RHEL8 on both Controller and new Compute roles.
Root Cause
Operations over roles with high amount of nodes are unnecessary slow. To speed up processing of Heat and Ansible runs it is better to split roles with big number of nodes into multiple roles where each role should be max ~25 nodes big.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.