How to remove a worker node from Red Hat OpenShift Container Platform 4 UPI?

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform 4
  • UPI installation

Issue

  • How to remove a worker node from an UPI installation of Red Hat OpenShift Container Platform 4?
  • Documentation explains only how to scale down a MachineSet but a baremetal or any other UPI installed cluster lacks of MachineSets.
  • How to recover a worker node that has been deleted?

Resolution

NOTE: If you are running ODF on worker nodes, removing the worker nodes where ODF pods run can cause data loss. Please do not remove nodes in such a situation. Scaling down ODF cluster is not supported. If you want to remove worker nodes where ODF is running, please file a support case.

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.

  • To delete the node from the UPI installation, the node must be firstly drained and then marked unschedulable prior to deleting it:
$ oc adm cordon <node_name>
$ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
  • Ensure also that there are no current jobs/cronjobs being ran or scheduled in this specific node as the draining does not take it into consideration.
  • For Red Hat OpenShift Container Platform 4.7+, utilize the option --delete-emptydir-data in case --delete-local-data doesn't work. The --delete-local-data option is deprecated in favor of --delete-emptydir-data.
$ oc get node <node_name> -o yaml > backupnode.yaml
  • To delete the node, run the command below.
$ oc delete node <node_name>
  • Although the node object is now deleted from the cluster, it can still rejoin the cluster after reboot or if the kubelet service is restarted. To permanently delete the node and all its data, you must decommission the node once it is in shutdown mode.
  • Apply this step only if node deletion is confirmed, since this step is going to non-revertible. If the node is a Virtual machine, simply mark it poweroff and then delete it from hypervisor console and if's a physical baremetal machine, first shred it's disk as follows and then power if off. Shredding the disk will remove all data on top of the disk.
# nohup shred -n 25 -f -z /dev/[HDD]

This command will overwrite all data on /dev/[HDD] repeatedly, in order to make it harder for even very expensive hardware probing to recover the data. Command line parameter -z will overwrite this device with zeros at the end of cycle to re-write data 25 times (it can be overridden with -n [number]). One should consider running this command from RescueCD.

If the node needs to be rejoined back to the cluster:

  • Once the node is deleted from OCP Software Layer, it can be ready for a power-off activity, or if it is needed to rejoin the cluster, it could be possible to either restart the kubelet or create the yaml back:
$ oc create -f backupnode.yaml
  • In order to get the node back, it can also be back by restarting kubelet and approve any node CSR if generated.
$ systemctl restart kubelet

Diagnostic Steps

In order to monitor the deletion of the node, get the kubelet live logs:

$ oc adm node-logs <node-name> -u kubelet
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.