Resizing disks or change instance type on Azure IPI control plane nodes in RHOCP 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Control Plane
  • Azure IPI

Issue

  • How to increase or reduce the disk size of control plane nodes on Azure IPI.
  • Can the instance type of the control plane nodes in Azure IPI can be changed?

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

Here is a comprehensive example illustrating the process. In this scenario, a cluster comprises three control plane and some worker nodes. The emphasis in this example is on updating the control plane nodes.

Note: The recommended disk size for OpenShift control plane nodes in Azure is 1024 GB as explained in Why is the minimum recommended size of disk for control plane nodes 1024 GB when installing OpenShift 4 on Azure? Smaller disks do not have enough performance for etcd, and specially for production environments.

IMPORTANT NOTE: this procedure relies on the controlplanemachineset Custom Resource, This page is not included, but the link has been rewritten to point to the nearest parent document.which was introduced in OpenShift 4.12. As explained in Missing controlplanemachineset resource in IPI RHOCP 4 cluster, the controlplanemachineset resource is not created by default in some combinations of OpenShift version and cloud provider (like OpenShift 4.12 and Azure). Refer to Creating controlplanemachineset in OpenShift 4.12 clusters in Azure for additional information about creating it for Azure if there is no controlplanemachineset present in an OpenShift 4.12 cluster in Azure installed with IPI.

Prerequisites

Modify the control plane machines

In the IPI installation method, alterations to the control plane nodes will be enforced by the controlplanemachineset resource.

  • List the control plane nodes and machines:

    $ oc get nodes -l node-role.kubernetes.io/master=
    NAME                   STATUS   ROLES                  AGE   VERSION
    master-0               Ready    control-plane,master   14d   v1.27.6+f67aeb3
    master-1               Ready    control-plane,master   14d   v1.27.6+f67aeb3
    master-2               Ready    control-plane,master   14d   v1.27.6+f67aeb3
    [...]
    
    $ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master 
    NAME             PHASE         TYPE              REGION   ZONE   AGE
    master-0         Running       Standard_D8s_v3   eastus   1      2d
    master-1         Running       Standard_D8s_v3   eastus   2      2d
    master-2         Running       Standard_D8s_v3   eastus   3      2d
    
  • The previous control plane nodes had a capacity of 256 gigabytes:

    $ oc get machines -l "machine.openshift.io/cluster-api-machine-role=master" -n openshift-machine-api -o yaml | grep "diskSize"
          diskSizeGB: 256
          diskSizeGB: 256
          diskSizeGB: 256
    
  • Verify the values prior to patching with the intended size specifications.

    $ oc get controlplanemachineset cluster -n openshift-machine-api -o yaml | grep "diskSizeGB:"
                  diskSizeGB: 1024
    
  • In this example, the nodes have been updated with a disk size of 1024:

      $ oc patch controlplanemachineset cluster -n openshift-machine-api --type merge \
    -p '{"spec":{"template":{"machines_v1beta1_machine_openshift_io":{"spec":{"providerSpec":{"value":{"osDisk":{"diskSizeGB":1024}}}}}}}}'
      controlplanemachineset.machine.openshift.io/cluster patched
    

    It is also possible to change the vmSize with a different instance type (usually with a more performant one for the control plane).
    >Note: Some instance types may require additional features, like for example Dsv5 family requires "Accelerated Networking". Check the requirements if new instance types are going to be used, and if it is needed to configure anything else in the controlplanemachineset). For the specific case of "Accelerated Networking", refer to Accelerated Networking for Microsoft Azure VMs.

  • After patching, it will require some time to update the node using a roll-out approach:

        $ watch -n 10 "oc get nodes -l node-role.kubernetes.io/master= && oc get co control-plane-machine-set"
        Every 10s: oc get nodes -l node-role.kubernetes.io/master= && oc get co control-plane-machine-set                                                                                        
        
        NAME                                STATUS                     ROLES                  AGE    VERSION
        master-2hpwl-2         Ready                      control-plane,master   11m    v1.27.6+f67aeb3
        master-fxmsn-2         Ready,SchedulingDisabled   control-plane,master   103m   v1.27.6+f67aeb3
        master-ph4s7-0         Ready                      control-plane,master   58m    v1.27.6+f67aeb3
        master-rlch2-1         Ready                      control-plane,master   34m    v1.27.6+f67aeb3
    
        NAME                        VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
        control-plane-machine-set   4.12.31   True        True          False      14d     Waiting for 1 old replica(s) to be removed
    
        NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE      AGE
        cluster   3         3         3       2                       Active     17d
    
  • If the vmSize field is changed with a different instance type, it's possible to see it with the following command:

    $ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master 
    NAME             PHASE      TYPE               REGION   ZONE   AGE
    master-2hpwl-2   Running    Standard_D16s_v3   eastus   3      19m
    master-fxmsn-2   Deleting   Standard_D8s_v3    eastus   3      111m
    master-ph4s7-0   Running    Standard_D16s_v3   eastus   1      66m
    master-rlch2-1   Running    Standard_D16s_v3   eastus   2      43m
    
  • The following command can be used to verify the changes:

    $ oc get machines -l "machine.openshift.io/cluster-api-machine-role=master" -n openshift-machine-api -o yaml | grep "diskSize\|vmSize"
              diskSizeGB: 1024
            vmSize: Standard_D16s_v3
              diskSizeGB: 1024
            vmSize: Standard_D16s_v3
              diskSizeGB: 1024
            vmSize: Standard_D16s_v3
    

Important: This procedure provides guidance on altering disk sizes or instance type to increase or decrease capacity. Exercise caution and refer to Managing control plane machines with control plane machine sets for further information and responsible implementation.

Root Cause

It is possible to resize the disk size of the control plane nodes in Azure without replacing all of them using the controlplanemachineset, which is available starting with OpenShift 4.12.

Note: The recommended disk size for OpenShift control plane nodes in Azure is 1024 GB as explained in Why is the minimum recommended size of disk for control plane nodes 1024 GB when installing OpenShift 4 on Azure? Smaller disks do not have enough performance for etcd, and specially for production environments.

Diagnostic Steps

The following commands are applicable for inspecting the current disk size and the instance type:

$ oc get machines -l "machine.openshift.io/cluster-api-machine-role=master" -n openshift-machine-api -o yaml | grep "diskSize\|vmSize"
          diskSizeGB: 1024
        vmSize: Standard_D16s_v3
          diskSizeGB: 1024
        vmSize: Standard_D16s_v3
          diskSizeGB: 1024
        vmSize: Standard_D16s_v3

For checking the disk size directly in the nodes, the following can be used (note that the disk device can change in different nodes):

$ for i in $(oc get nodes -l node-role.kubernetes.io/master= --no-headers | awk '{ print $1 }'); do
    echo -e "\n\n\t\tNode: $i\t\t\n"
    oc debug node/$i -- chroot /host lsblk -b
    echo "---------------------------------------------"
done
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.