The node in a degraded state because of the use of a deleted machineconfig: machineconfig.machineconfiguration.openshift.io; rendered-$[custom-machine-config] not found in OpenShift 4.x
Environment
- Red Hat Openshift Container Platform (RHOCP)
- 4
Issue
- Assigning a
machineConfigPoolto a node and then deleting by accident themachineConfigin use will put the node in a loopback trying to render the deletedmachineConfig. It will cause discrepancies with the existingmachineConfigPoolto be in a degraded state. - Deletion of a rendered
machineConfigin use should never be done since it will lead to a degraded status of the node associated. - The only other reason to lose a rendered config is a drift during install between the bootstrap and master generation.
Resolution
The node annotation machineconfiguration.openshift.io/desiredConfig is generated by the machine-config-controller, and there is no way to "update" it other than having the controller re-render manually(rendered-config-xxx describes the state of a system). The machine-config-controlleris not able to understand what the config needs to be since the rendered-config is an aggregate of all machineConfigs assigned to that node selector.
- It is needed to recover the non-existent rendered-worker
machineConfigby manually recreating it with the yaml used and resettingmachineConfigdaemon. You may export a previous/existing rendered-machine-config to yaml, rename it to match the missing/desiredConfig value that cannot be found, and then create the object:
$ oc get mc | grep rendered #look for newest creation date for your desired build
$ oc get mc/<rendered-config> -o yaml > rendered-mc-backup.yaml
- edit the rendered-mc-backup.yaml to remove superfluous fields (*below the lines to remove have been commented out with a
#)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
annotations:
machineconfiguration.openshift.io/generated-by-controller-version: f6d1fe753cbcecb3aa1c2d3d3edd4a5d04ffca54
# creationTimestamp: "2020-04-30T09:54:38Z"
# generation: 1
name: rendered-worker-a61c08084de6ae14629347c8760daa5f #RENAME TO MISSING CONFIG
ownerReferences:
- apiVersion: machineconfiguration.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: MachineConfigPool
name: worker
uid: bc1216ef-4f50-4c95-894a-24f22e3e08fb
# resourceVersion: "341458"
# selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigs/rendered-worker-a61c08084de6ae14629347c8760daa5f
# uid: b8aff578-0fd2-454f-b014-92bd85b152af
spec:
config:
...
- Now create the missing machine-config object and force the nodes to re-validate their configuration template:
$ oc create -f rendered-mc-backup.yaml
$ oc debug node/$node_name -- touch /host/run/machine-config-daemon-force
-
Nodes will be checking for the
machineconfiguration.openshift.io/desiredConfigvalue for whichrendered-*config is applied next, when you run the abovetouchcommand, which effectively just forces the daemon pod running on the node to initiate a reboot into the template listed. -
You may observe what
rendered-*machine-config build is going to be applied (or adjust it) by reviewing the node object yaml for each host node:
for i in $(oc get nodes | awk {'print $1'}); do echo $i; oc get node/$i -o yaml | grep -Ei "currentConfig|desiredConfig"; done
- Apply the following in an automated way in order to have
machineconfiguration.openshift.io/currentConfigandmachineconfiguration.openshift.io/desiredConfigmatching:
$ oc patch node $node_name --type merge --patch "{\"metadata\": {\"annotations\": {\"machineconfiguration.openshift.io/currentConfig\": \"${new_value}\"}}}"
$ oc patch node $node_name --type merge --patch "{\"metadata\": {\"annotations\": {\"machineconfiguration.openshift.io/desiredConfig\": \"${new_value}\"}}}"
$ oc patch node $node_name --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/reason": ""}}}'
$ oc patch node $node_name --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/state": "Done"}}}'
- Or manually edit the node's annotation
machineconfiguration.openshift.io/currentConfigin order to be paired up withmachineconfiguration.openshift.io/desiredConfigby following the example, as well as the annotationmachineconfiguration.openshift.io/stateto 'Done' status while leavingmachineconfiguration.openshift.io/reasonempty:
$ oc edit node/$node
Annotations:
machineconfiguration.openshift.io/currentConfig: rendered-worker-36f40ba3c1038c7ce5ce54f8a840a58f
machineconfiguration.openshift.io/desiredConfig: rendered-worker-36f40ba3c1038c7ce5ce54f8a840a58f
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done
A reboot will be triggered(if paused spec is set to false as per oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/$MCP_name) and the node will come back to Ready status.
If any label was set in the node, delete the label used after the operation:
$ oc label node $worker-example.redhat.com node-role.kubernetes.io/$[label]-
Root Cause
The node annotations machineconfiguration.openshift.io/currentConfig or machineconfiguration.openshift.io/desiredConfig point to a machineConfig that no longer exists, and the machine-config-operator can not process that request.
Diagnostic Steps
- The following message indicates that the machine config rendered used is not found.
$ oc get mcp -n openshift-machine-config-operator
$ oc describe mcp $x -n openshift-machine-config-operator
....
Message: Node worker-2.example.com is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-test-$[ID]\" not found"
Reason: 1 nodes are reporting degraded status on sync
....
- Gather the following information of the
machine-config-operator:
# oc describe clusteroperator machine-config
# oc describe machineconfig -n openshift-machine-config-operator
# oc get machineconfigpool -n openshift-machine-config-operator
# oc describe machineconfigpool the-failing-pool -n openshift-machine-config-operator
# oc describe node
- Gather the following logs:
# for POD in $(oc get po -l k8s-app=machine-config-daemon -o name | awk -F '/' '{print $2 }'); do oc logs $POD > $POD.log; done
The logs will print the following message:
namespaces/openshift-machine-config-operator/pods/machine-config-daemon-lrcsq/machine-config-daemon/machine-config-daemon/logs/current.log:2020-04-06T21:09:14.588375761-04:00 E0407 01:09:14.588272 3997140 writer.go:130] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-test-$[ID]" not found
- Check the node annotations
machineconfiguration.openshift.io/currentConfigandmachineconfiguration.openshift.io/desiredConfig, as those needs to be the same. The following output shows a working example:
$ oc describe node master-0.test.com | grep -i config
machineconfiguration.openshift.io/currentConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
machineconfiguration.openshift.io/desiredConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
- The following would describe the issue faced on this solution, as
machineconfiguration.openshift.io/currentConfigis not the same asmachineconfiguration.openshift.io/desiredConfig:
$ oc describe node master-0.test.com | grep -i config
machineconfiguration.openshift.io/currentConfig: rendered-master-asd3451243ggs4543ecd3265754g49a
machineconfiguration.openshift.io/desiredConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
- Check if it is failing by checking the
machineConfigPoolendpoint, as themachineConfigPoolwould be deleted. In this case, the one pointed undermachineconfiguration.openshift.io/currentConfig:
$ curl -kv https://localhost:22623/config/$nameMCP
I0417 13:20:37.448036 1 api.go:97] Pool server requested by [::1]:32916
E0417 13:20:37.451292 1 api.go:103] couldn't get config for req: {server}, error: could not fetch pool. err: machineconfigpools.machineconfiguration.openshift.io "$nameMCP" not found
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.