Resizing Master Nodes in IPI-Installed OCP 4.6-4.11 on AWS or Azure.

Updated

This article is intended to demonstrate how to resize control plane nodes in the Red Hat OpenShift Container Platform version from 4.6 to 4.11 hosted in AWS and Azure cloud providers. The steps may also work in later versions up to 4.18, however, starting in version 4.12, the Control Plane Machine Sets (recommended) automates all the steps required to safely resize the control plane nodes.

NOTES:

  • The steps provided were tested on master nodes not managed by Control Plane Machine Sets
  • This tutorial is split into 5 parts (from a to d)
  • This tutorial is valid for IPI deployments only
  • This tutorial describes the steps to change the size of the node, and it's not related to cluster scaling

Prerequisites

Step-by-step

ATTENTION: These steps should be followed by one machine per time. Do not proceed to the second node while the CO (cluster operators) are not stabilized at the end, and both machine and node need to be in Ready status.

a. Verify the health of the machines

  1. Check the provider

    $ oc get infrastructures -o jsonpath='{.items[*].status.platformStatus.type}'
    
  2. Check if all the nodes are Ready and machines are Running
    Replace master with the proper role intended.

    $ oc get nodes -l kubernetes.io/os=linux,node-role.kubernetes.io/master=
    $ oc get machines  -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master
    
  3. AWS | Azure - Gather Cloud provider information from Machine objects.
    For AWS

    $ oc get machines \
        -n openshift-machine-api \
        -l machine.openshift.io/cluster-api-machine-role=master \
        -o json \
        | jq -r '.items[]| (\
            "node_name: " + .status.nodeRef.name,\
            "machine_name: "+ .metadata.name,\
            "instanceId: "+ .status.providerStatus.instanceId,\
            "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
            "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
            "")'
    

    For AZURE

    $ oc get machines \
        -n openshift-machine-api \
        -l machine.openshift.io/cluster-api-machine-role=master \
        -o json \
        | jq -r '.items[]| (\
            "node_name: " + .status.nodeRef.name,\
            "machine_name: "+ .metadata.name,\
            "instanceId: "+ .status.providerStatus.vmId,\
            "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
            "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\
            "")'
    

b. Prepare the machines and variables to be used

  • Repeat the steps below for each machine that needs to be resized

  • The variable machine_name should be updated for each machine to resize.

    1. Choose the best machine to be resized
      To avoid etcd leader elections, choose first the master nodes that are not running the etcd leader pod.
      To check the etcd leader, just run the endpoint status on etcdctl and check the field IS_LEADER:
    ```
    $ oc -n openshift-etcd exec \
            $(oc get pods \
                -n openshift-etcd \
                -l app=etcd \
                -o jsonpath='{.items[0].metadata.name}') \
            -- etcdctl endpoint status -w table
    ```
    
    1. Set the machine_name variable value.
    ```
    $ machine_name=mrbaz01-2754r-master-0
    ```
    
    1. Set the new Machine size
    ```
    $ new_machine_type="<cloud_provider_size>"
    ```
    
    1. AWS | Azure - To check EC2 compatibility with OCP:
    **For AWS** - Check [reference "Machines types" ](https://docs.openshift.com/container-platform/4.8/installing/installing_aws/installing-aws-vpc.html#installation-supported-aws-machine-types_installing-aws-vpc).
    
    
    ```
    $ new_machine_type="m5.xlarge"
    ```
    
    
    **For Azure** - To check the VM size available for a specific VM, run:
    
    
    ```
    $ az vm list-vm-resize-options --resource-group ${resource_group} --name ${machine_name} --output table
    ```
    
    1. Then set the desired value:
    ```
    $ new_machine_type="Standard_D8s_v3"
    ```
    
  • Collect machine info
    Do not change any step described below, just run according to the environment.

  • AWS | Azure - Discovery variable values based on ${machine_name}

    For AWS

    $ instanceId=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.providerStatus.instanceId})
    $ node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
    

    For Azure

    $ resource_group=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.spec.providerSpec.value.resourceGroup})
    $ instanceId=${machine_name}
    $ node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})
    
  • Make sure all variables are set:

    $ echo "[${instanceId}] [${node_name}] ${resource_group:-}"
    
  • Graceful Power off

  • Cordon the node

    $ oc adm cordon ${node_name}
    
  • Drain the node

    $ oc adm drain ${node_name} --ignore-daemonsets --grace-period=60 --delete-local-data
    
  • Shutdown -Wait for the node to shutdown

    $ oc debug node/${node_name} -- chroot /host shutdown -h 1
    

Attention:

  • Wait until the node is Status: NotReady
  • ​Verify that the node is turned off at the cloud provider level

c. Change the instance Type

  1. AWS | Azure - Change the size and validate the change:
    For AWS

    $ aws ec2 modify-instance-attribute --instance-id ${instanceId} --instance-type ${new_machine_type}
    $ aws ec2 describe-instance-attribute --instance-id ${instanceId} --attribute instanceType
    

    For Azure

    $ az vm resize --resource-group ${resource_group} --name ${machine_name} --size ${new_machine_type}
    $ az vm get-instance-view --resource-group ${resource_group} --name ${machine_name} --output json | jq -r '.hardwareProfile.vmSize'
    
  2. Power on the VM

  • Wait for the node to be in Ready (STATUS=Ready)

    $ oc get node ${node_name} -w
    
  • Wait for the Machine API to reconcile and update the new machine size type

    $ oc get machine ${machine_name} -n openshift-machine-api
    
  • Uncordon the node

    $ oc adm uncordon ${node_name}
    
  • Wait for the Cluster Operators to stabilize:

    $ oc get co -w
    

d. Patch the Machine Object with the new size.

  • AWS | Azure - Do the patch and review the change:

    For AWS

         $ oc patch machine ${machine_name} -n openshift-machine-api --type=merge \
     -p "{\"spec\":{\"providerSpec\":{\"value\":{\"instanceType\":\"${new_machine_type}\"}}}}"
         
         $ oc get machines ${machine_name} -n openshift-machine-api -o json | jq -r '. | (\
         "node_name: " + .status.nodeRef.name,\
         "machine_name: "+ .metadata.name,\
         "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\
         "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type","")'
    
    **For Azure**
    
         $ oc patch machine ${machine_name} -n openshift-machine-api --type=merge \
     -p "{\"spec\":{\"providerSpec\":{\"value\":{\"vmSize\":\"${new_machine_type}\"}}}}"
         
         $ oc get machines ${machine_name} -n openshift-machine-api -o json | jq -r '. | (\
         "node_name: " + .status.nodeRef.name,\
         "machine_name: "+ .metadata.name,\
         "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\
         "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type","")'
    

    These steps should be followed by one machine per time. Just proceed to the second node when:

  • All COs clusteroperators are healthy and not degraded

    $ oc get clusteroperators
    
  • Both machine and node are in Ready / Running status.

    $ oc get machines -n openshift-machine-api
    $ oc get nodes
    
Category
Article Type