Resizing Master Nodes in IPI-Installed OCP 4.6-4.11 on AWS or Azure.
This article is intended to demonstrate how to resize control plane nodes in the Red Hat OpenShift Container Platform version from 4.6 to 4.11 hosted in AWS and Azure cloud providers. The steps may also work in later versions up to 4.18, however, starting in version 4.12, the Control Plane Machine Sets (recommended) automates all the steps required to safely resize the control plane nodes.
NOTES:
- The steps provided were tested on master nodes not managed by Control Plane Machine Sets
- This tutorial is split into 5 parts (from a to d)
- This tutorial is valid for IPI deployments only
- This tutorial describes the steps to change the size of the node, and it's not related to cluster scaling
Prerequisites
- Have the CLI for the respective Cloud provider properly configured. For AWS Content from docs.aws.amazon.com is not included.(aws) / for Azure Content from docs.microsoft.com is not included.(az).
- The package
jqis installed in the bastion where the commands will run - Content from stedolan.github.io is not included.jq documentation. - The
occlient matches the OpenShift version currently running.$ oc version- This page is not included, but the link has been rewritten to point to the nearest parent document.oc documentation.
Step-by-step
ATTENTION: These steps should be followed by one machine per time. Do not proceed to the second node while the CO (cluster operators) are not stabilized at the end, and both machine and node need to be in Ready status.
a. Verify the health of the machines
-
Check the provider
$ oc get infrastructures -o jsonpath='{.items[*].status.platformStatus.type}' -
Check if all the nodes are
Readyand machines areRunning
Replacemasterwith the proper role intended.$ oc get nodes -l kubernetes.io/os=linux,node-role.kubernetes.io/master= $ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master -
AWS | Azure - Gather Cloud provider information from Machine objects.
For AWS$ oc get machines \ -n openshift-machine-api \ -l machine.openshift.io/cluster-api-machine-role=master \ -o json \ | jq -r '.items[]| (\ "node_name: " + .status.nodeRef.name,\ "machine_name: "+ .metadata.name,\ "instanceId: "+ .status.providerStatus.instanceId,\ "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\ "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\ "")'For AZURE
$ oc get machines \ -n openshift-machine-api \ -l machine.openshift.io/cluster-api-machine-role=master \ -o json \ | jq -r '.items[]| (\ "node_name: " + .status.nodeRef.name,\ "machine_name: "+ .metadata.name,\ "instanceId: "+ .status.providerStatus.vmId,\ "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\ "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type",\ "")'
b. Prepare the machines and variables to be used
-
Repeat the steps below for each machine that needs to be resized
-
The variable
machine_nameshould be updated for each machine to resize.- Choose the best machine to be resized
To avoid etcd leader elections, choose first the master nodes that are not running the etcd leader pod.
To check the etcd leader, just run theendpoint statusonetcdctland check the fieldIS_LEADER:
``` $ oc -n openshift-etcd exec \ $(oc get pods \ -n openshift-etcd \ -l app=etcd \ -o jsonpath='{.items[0].metadata.name}') \ -- etcdctl endpoint status -w table ```- Set the
machine_namevariable value.
``` $ machine_name=mrbaz01-2754r-master-0 ```- Set the new Machine size
``` $ new_machine_type="<cloud_provider_size>" ```- AWS | Azure - To check EC2 compatibility with OCP:
**For AWS** - Check [reference "Machines types" ](https://docs.openshift.com/container-platform/4.8/installing/installing_aws/installing-aws-vpc.html#installation-supported-aws-machine-types_installing-aws-vpc). ``` $ new_machine_type="m5.xlarge" ``` **For Azure** - To check the VM size available for a specific VM, run: ``` $ az vm list-vm-resize-options --resource-group ${resource_group} --name ${machine_name} --output table ```- Then set the desired value:
``` $ new_machine_type="Standard_D8s_v3" ``` - Choose the best machine to be resized
-
Collect machine info
Do not change any step described below, just run according to the environment. -
AWS | Azure - Discovery variable values based on
${machine_name}For AWS
$ instanceId=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.providerStatus.instanceId}) $ node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name})For Azure
$ resource_group=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.spec.providerSpec.value.resourceGroup}) $ instanceId=${machine_name} $ node_name=$(oc get machine ${machine_name} -n openshift-machine-api -o jsonpath={.status.nodeRef.name}) -
Make sure all variables are set:
$ echo "[${instanceId}] [${node_name}] ${resource_group:-}" -
Graceful Power off
-
Cordon the node
$ oc adm cordon ${node_name} -
Drain the node
$ oc adm drain ${node_name} --ignore-daemonsets --grace-period=60 --delete-local-data -
Shutdown -Wait for the node to shutdown
$ oc debug node/${node_name} -- chroot /host shutdown -h 1
Attention:
- Wait until the node is
Status: NotReady - Verify that the node is turned off at the cloud provider level
c. Change the instance Type
-
AWS | Azure - Change the size and validate the change:
For AWS$ aws ec2 modify-instance-attribute --instance-id ${instanceId} --instance-type ${new_machine_type} $ aws ec2 describe-instance-attribute --instance-id ${instanceId} --attribute instanceTypeFor Azure
$ az vm resize --resource-group ${resource_group} --name ${machine_name} --size ${new_machine_type} $ az vm get-instance-view --resource-group ${resource_group} --name ${machine_name} --output json | jq -r '.hardwareProfile.vmSize' -
Power on the VM
-
Wait for the node to be in Ready (STATUS=Ready)
$ oc get node ${node_name} -w -
Wait for the Machine API to reconcile and update the new machine size
type$ oc get machine ${machine_name} -n openshift-machine-api -
Uncordon the node
$ oc adm uncordon ${node_name} -
Wait for the Cluster Operators to stabilize:
$ oc get co -w
d. Patch the Machine Object with the new size.
-
AWS | Azure - Do the patch and review the change:
For AWS
$ oc patch machine ${machine_name} -n openshift-machine-api --type=merge \ -p "{\"spec\":{\"providerSpec\":{\"value\":{\"instanceType\":\"${new_machine_type}\"}}}}" $ oc get machines ${machine_name} -n openshift-machine-api -o json | jq -r '. | (\ "node_name: " + .status.nodeRef.name,\ "machine_name: "+ .metadata.name,\ "instanceTypeSpec: "+ .spec.providerSpec.value.instanceType,\ "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type","")'**For Azure**$ oc patch machine ${machine_name} -n openshift-machine-api --type=merge \ -p "{\"spec\":{\"providerSpec\":{\"value\":{\"vmSize\":\"${new_machine_type}\"}}}}" $ oc get machines ${machine_name} -n openshift-machine-api -o json | jq -r '. | (\ "node_name: " + .status.nodeRef.name,\ "machine_name: "+ .metadata.name,\ "instanceTypeSpec: "+ .spec.providerSpec.value.vmSize,\ "instanceTypeMeta: "+ .metadata.labels."machine.openshift.io/instance-type","")'These steps should be followed by one machine per time. Just proceed to the second node when:
-
All COs
clusteroperatorsare healthy and not degraded$ oc get clusteroperators -
Both
machineandnodeare inReady/Runningstatus.$ oc get machines -n openshift-machine-api $ oc get nodes