How to remove a worker node from Red Hat OpenShift Container Platform 4 UPI?
Environment
- Red Hat OpenShift Container Platform 4
UPIinstallation
Issue
- How to remove a
workernode from anUPIinstallation ofRed Hat OpenShift Container Platform 4? - Documentation explains only how to scale down a
MachineSetbut abaremetalor any otherUPIinstalled cluster lacks ofMachineSets. - How to recover a
workernode that has been deleted?
Resolution
NOTE: If you are running ODF on worker nodes, removing the worker nodes where ODF pods run can cause data loss. Please do not remove nodes in such a situation. Scaling down ODF cluster is not supported. If you want to remove worker nodes where ODF is running, please file a support case.
When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node are not deleted. Any bare pods not backed by a replication controller become inaccessible to OpenShift Container Platform. Pods backed by replication controllers are rescheduled to other available nodes. You must delete local manifest pods.
- To delete the node from the
UPIinstallation, the node must be firstly drained and then markedunschedulableprior to deleting it:
$ oc adm cordon <node_name>
$ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
- Ensure also that there are no current
jobs/cronjobsbeing ran or scheduled in this specific node as the draining does not take it into consideration. - For Red Hat OpenShift Container Platform 4.7+, utilize the option
--delete-emptydir-datain case--delete-local-datadoesn't work. The--delete-local-dataoption is deprecated in favor of--delete-emptydir-data.
$ oc get node <node_name> -o yaml > backupnode.yaml
- To delete the node, run the command below.
$ oc delete node <node_name>
- Although the node object is now deleted from the cluster, it can still rejoin the cluster after
rebootor if thekubeletservice is restarted. To permanently delete the node and all its data, you must decommission the node once it is inshutdownmode. - Apply this step only if node deletion is confirmed, since this step is going to non-revertible. If the
nodeis a Virtual machine, simply mark itpoweroffand thendeleteit fromhypervisorconsole and if's a physicalbaremetalmachine, firstshredit's disk as follows and then power if off. Shredding the disk will remove all data on top of the disk.
# nohup shred -n 25 -f -z /dev/[HDD]
This command will overwrite all data on /dev/[HDD] repeatedly, in order to make it harder for even very expensive hardware probing to recover the data. Command line parameter -z will overwrite this device with zeros at the end of cycle to re-write data 25 times (it can be overridden with -n [number]). One should consider running this command from RescueCD.
If the node needs to be rejoined back to the cluster:
- Once the node is deleted from
OCPSoftware Layer, it can be ready for a power-off activity, or if it is needed to rejoin the cluster, it could be possible to either restart thekubeletor create theyamlback:
$ oc create -f backupnode.yaml
- In order to get the
nodeback, it can also be back by restartingkubeletandapproveanynode CSRif generated.
$ systemctl restart kubelet
Diagnostic Steps
In order to monitor the deletion of the node, get the kubelet live logs:
$ oc adm node-logs <node-name> -u kubelet
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.