How to split roles during upgrade from RHOSP 16.2 to RHOSP 17.1

Solution Verified - Updated

Environment

  • Red Hat OpenStack Platform 16.2 being upgraded to Red Hat OpenStack Platform 17.1

Issue

  • Upgrade operations over roles with high amount of nodes are slow. How can we speed them up?
  • FFU performance improvement for RHOSP deployment with many nodes assigned to each role.
  • How to split existing roles for overcloud nodes during FFU?

Resolution

Please find a step-by-step guidelines for a scenario where SpecialCompute role originally has 50 nodes. Two new roles (SpecialComputeA and SpecialComputeB) will be introduced and 25 existing nodes will moved to each new role.

  1. Follow the Framework for upgrades (16.2 to 17.1) procedure until step 5.1.12 (it starts with Run the upgrade preparation script for each stack in your environment). Please interrupt normal steps just before you run the overcloud_upgrade_prepare.sh script.

  2. Create the relevant roles for your environment by tuning the role_data.yaml file and copy the source SpecialCompute role definition to the new role. Repeat this step for each additional role required. New roles can have 0 nodes until you are ready to move your SpecialCompute nodes to the new roles. Please note that at this step a RHEL 8 roles are created:

     name: SpecialComputeA
      description: |
       Basic Compute Node role
      CountDefault: 1
      rhsm_enforce_multios: 8.4
     ...
     ServicesDefault:
     ...
     - OS::TripleO::Services::NovaLibvirtLegacy
    
  3. Copy the overcloud_upgrade_prepare.sh file to the copy_role_SpecialComputeA_param.sh and copy_role_SpecialComputeB_param.sh files:

     $ cp overcloud_upgrade_prepare.sh copy_role_SpecialComputeA_param.sh
     $ cp overcloud_upgrade_prepare.sh copy_role_SpecialComputeB_param.sh
    
  4. Edit both copy_role_<Compute>_param.sh files to include the copy_role_params.py script. This script generates the environment file that contains the additional parameters and resources for the new role. Use the -o option to define the name of the output file that includes all the non-default values of the source Compute role for the new role. For example for SpecialComputeA:

     /usr/share/openstack-tripleo-heat-templates/tools/copy_role_params.py --rolename-src SpecialCompute --rolename-dst SpecialComputeA \
     -o SpecialComputeA_params.yaml \
     -e /home/stack/templates/internal.yaml \
     -e /usr/share/openstack-tripleo-heat-templates/environments/services/neutron-ovs.yaml \
     -e /home/stack/templates/network/network-environment.yaml \
     -e /home/stack/templates/inject-trust-anchor.yaml \
     -e /home/stack/templates/hostnames.yml \
     -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
     -e /home/stack/templates/nodes_data.yaml \
     -e /home/stack/templates/debug.yaml \
     -e /home/stack/templates/firstboot.yaml \
     -e /home/stack/overcloud-params.yaml \
     -e /home/stack/overcloud-deploy/overcloud/overcloud-network-environment.yaml \
     -e /home/stack/overcloud_adopt/baremetal-deployment.yaml \
     -e /home/stack/overcloud_adopt/generated-networks-deployed.yaml \
     -e /home/stack/overcloud_adopt/generated-vip-deployed.yaml \
     -e /usr/share/openstack-tripleo-heat-templates/environments/nova-hw-machine-type-upgrade.yaml \
     -e ~/containers-prepare-parameter.yaml
    
  5. Run both copy_role_SpecialComputeA_param.sh and copy_role_SpecialComputeB_param.sh scripts:

     $ sh /home/stack/copy_role_SpecialComputeA_param.sh
    
  6. Move the Compute nodes from the source role to the new role. Run this script listing source role SpecialCompute and destination role SpecialComputeA and list 25 computes you want to move from source to destination. Than run the same script for source SpecialCompute and destination SpecialComputeB with list of remaining computes you want to move.

    python3 /usr/share/openstack-tripleo-heat-templates/tools/baremetal_transition.py  --baremetal-deployment /home/stack/tripleo-<stack>-baremetal-deployment.yaml  --src-role SpecialCompute  --dst-role SpecialComputeA compute-0 compute-1 compute-2 compute-3
    

Note:

    This tool includes the original /home/stack/tripleo-<stack>-baremetal-deployment.yaml file that you exported during the undercloud upgrade. The tool copies and renames the source role definition in the /home/stack/tripleo-<stack>-baremetal-deployment.yaml file. Then, it changes the hostname_format to prevent a conflict with the newly created destination role. The tool then moves the node from the source role to the destination role and changes the count values.
    Replace <stack> with the name of your stack.
    Replace <Compute_source_role> with the name of the source Compute role that contains the nodes that you are moving to your new role.
    Replace <Compute_destination_role> with the name of your new role.
    Replace <Compute-0> <Compute-1> <Compute-2> with the names of the nodes that you are moving to your new role. 
  1. Reprovision the nodes to update the environment files in the stack with the new role location:

     $ openstack overcloud node provision --stack <stack> --output /home/stack/overcloud_adopt/baremetal-deployment.yaml /home/stack/tripleo-<stack>-baremetal-deployment.yaml
    

    Note:

     The output baremetal-deployment.yaml file is the same file that is used in the overcloud_upgrade_prepare.sh file during overcloud adoption.
    
  2. Ensure new compute roles get EL8 containers as we are not updating OS but just OpenStack at the moment. Rerun step 5.1.8 (starts with Complete the following steps to prepare the containers) with updated COMPUTE_ROLES variable.

     COMPUTE_ROLES="--role SpecialCompute --role SpecialComputeA --role SpecialComputeB"
     
     python3 /usr/share/openstack-tripleo-heat-templates/tools/multi-rhel-container-image-prepare.py \
     ${COMPUTE_ROLES} \
     ${CONTROL_PLANE_ROLES} \
     --enable-multi-rhel \
     --excludes collectd \
     --excludes nova-libvirt \
     --minor-override "{${EL8_TAGS}${EL8_NAMESPACE}${CEPH_OVERRIDE}${NEUTRON_DRIVER}\"no_tag\":\"not_used\"}" \
     --major-override "{${EL9_TAGS}${NAMESPACE}${CEPH_OVERRIDE}${NEUTRON_DRIVER}\"no_tag\":\"not_used\"}" \
     --output-env-file \
     /home/stack/containers-prepare-parameter.yaml
    
  3. Include the environment file that contains the parameters needed for the new roles (-e) in your overcoud_upgrade_prepare.sh. For example:

     ...
     -e /home/stack/SpecialComputeA_params.yaml \
     -e /home/stack/SpecialComputeA_params.yaml \
     ...
    

Now you can continue with step 5.1.12 by running overcloud_upgrade_prepare.sh. Procedure is pretty much same as for 11.3. Upgrading Compute nodes to a Multi-RHEL environment but we have RHEL8 on both Controller and new Compute roles.

Root Cause

Operations over roles with high amount of nodes are unnecessary slow. To speed up processing of Heat and Ansible runs it is better to split roles with big number of nodes into multiple roles where each role should be max ~25 nodes big.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.