The node in a degraded state because of the use of a deleted machineconfig: machineconfig.machineconfiguration.openshift.io; rendered-$[custom-machine-config] not found in OpenShift 4.x

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform (RHOCP)
    • 4

Issue

  • Assigning a machineConfigPool to a node and then deleting by accident the machineConfig in use will put the node in a loopback trying to render the deleted machineConfig. It will cause discrepancies with the existing machineConfigPool to be in a degraded state.
  • Deletion of a rendered machineConfig in use should never be done since it will lead to a degraded status of the node associated.
  • The only other reason to lose a rendered config is a drift during install between the bootstrap and master generation.

Resolution

The node annotation machineconfiguration.openshift.io/desiredConfig is generated by the machine-config-controller, and there is no way to "update" it other than having the controller re-render manually(rendered-config-xxx describes the state of a system). The machine-config-controlleris not able to understand what the config needs to be since the rendered-config is an aggregate of all machineConfigs assigned to that node selector.

  • It is needed to recover the non-existent rendered-worker machineConfig by manually recreating it with the yaml used and resetting machineConfig daemon. You may export a previous/existing rendered-machine-config to yaml, rename it to match the missing/desiredConfig value that cannot be found, and then create the object:
$ oc get mc | grep rendered #look for newest creation date for your desired build
$ oc get mc/<rendered-config> -o yaml > rendered-mc-backup.yaml
  • edit the rendered-mc-backup.yaml to remove superfluous fields (*below the lines to remove have been commented out with a #)
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  annotations:
    machineconfiguration.openshift.io/generated-by-controller-version: f6d1fe753cbcecb3aa1c2d3d3edd4a5d04ffca54
#  creationTimestamp: "2020-04-30T09:54:38Z" 
#  generation: 1 
  name: rendered-worker-a61c08084de6ae14629347c8760daa5f #RENAME TO MISSING CONFIG
  ownerReferences:
  - apiVersion: machineconfiguration.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: MachineConfigPool
    name: worker
    uid: bc1216ef-4f50-4c95-894a-24f22e3e08fb
#  resourceVersion: "341458"
#  selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigs/rendered-worker-a61c08084de6ae14629347c8760daa5f 
#  uid: b8aff578-0fd2-454f-b014-92bd85b152af 
spec:
  config:
...
  • Now create the missing machine-config object and force the nodes to re-validate their configuration template:
$ oc create -f rendered-mc-backup.yaml
$ oc debug node/$node_name -- touch /host/run/machine-config-daemon-force
  • Nodes will be checking for the machineconfiguration.openshift.io/desiredConfig value for which rendered-* config is applied next, when you run the above touch command, which effectively just forces the daemon pod running on the node to initiate a reboot into the template listed.

  • You may observe what rendered-* machine-config build is going to be applied (or adjust it) by reviewing the node object yaml for each host node:

for i in $(oc get nodes | awk {'print $1'}); do echo $i; oc get node/$i -o yaml | grep -Ei "currentConfig|desiredConfig"; done

  • Apply the following in an automated way in order to have machineconfiguration.openshift.io/currentConfig and machineconfiguration.openshift.io/desiredConfig matching:
$ oc patch node $node_name --type merge --patch "{\"metadata\": {\"annotations\": {\"machineconfiguration.openshift.io/currentConfig\": \"${new_value}\"}}}"
$ oc patch node $node_name  --type merge --patch "{\"metadata\": {\"annotations\": {\"machineconfiguration.openshift.io/desiredConfig\": \"${new_value}\"}}}"
$ oc patch node $node_name  --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/reason": ""}}}'
$ oc patch node $node_name  --type merge --patch '{"metadata": {"annotations": {"machineconfiguration.openshift.io/state": "Done"}}}'
  • Or manually edit the node's annotation machineconfiguration.openshift.io/currentConfigin order to be paired up with machineconfiguration.openshift.io/desiredConfig by following the example, as well as the annotation machineconfiguration.openshift.io/stateto 'Done' status while leaving machineconfiguration.openshift.io/reason empty:
$ oc edit node/$node 
Annotations:        
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-36f40ba3c1038c7ce5ce54f8a840a58f
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-36f40ba3c1038c7ce5ce54f8a840a58f
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done

A reboot will be triggered(if paused spec is set to false as per oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/$MCP_name) and the node will come back to Ready status.

If any label was set in the node, delete the label used after the operation:

$ oc label node $worker-example.redhat.com node-role.kubernetes.io/$[label]-

Root Cause

The node annotations machineconfiguration.openshift.io/currentConfig or machineconfiguration.openshift.io/desiredConfig point to a machineConfig that no longer exists, and the machine-config-operator can not process that request.

Diagnostic Steps

  • The following message indicates that the machine config rendered used is not found.
$ oc get mcp -n openshift-machine-config-operator
$ oc describe mcp $x -n openshift-machine-config-operator
....
Message:               Node worker-2.example.com is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-test-$[ID]\" not found"
    Reason:                1 nodes are reporting degraded status on sync
....
  • Gather the following information of the machine-config-operator:
# oc describe clusteroperator machine-config
# oc describe machineconfig -n openshift-machine-config-operator
# oc get machineconfigpool -n openshift-machine-config-operator
# oc describe machineconfigpool the-failing-pool -n openshift-machine-config-operator
# oc describe node
  • Gather the following logs:
# for POD in $(oc get po -l k8s-app=machine-config-daemon -o name | awk -F '/' '{print $2 }'); do oc logs $POD > $POD.log; done

The logs will print the following message:

namespaces/openshift-machine-config-operator/pods/machine-config-daemon-lrcsq/machine-config-daemon/machine-config-daemon/logs/current.log:2020-04-06T21:09:14.588375761-04:00 E0407 01:09:14.588272 3997140 writer.go:130] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-test-$[ID]" not found
  • Check the node annotations machineconfiguration.openshift.io/currentConfig and machineconfiguration.openshift.io/desiredConfig, as those needs to be the same. The following output shows a working example:
$ oc describe node master-0.test.com | grep -i config
                    machineconfiguration.openshift.io/currentConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
  • The following would describe the issue faced on this solution, as machineconfiguration.openshift.io/currentConfig is not the same as machineconfiguration.openshift.io/desiredConfig:
$ oc describe node master-0.test.com | grep -i config
                    machineconfiguration.openshift.io/currentConfig: rendered-master-asd3451243ggs4543ecd3265754g49a
                    machineconfiguration.openshift.io/desiredConfig: rendered-master-e92fca201accd77ecd32d72796a959a4
  • Check if it is failing by checking the machineConfigPool endpoint, as the machineConfigPool would be deleted. In this case, the one pointed under machineconfiguration.openshift.io/currentConfig:
$ curl -kv https://localhost:22623/config/$nameMCP

I0417 13:20:37.448036       1 api.go:97] Pool server requested by [::1]:32916
E0417 13:20:37.451292       1 api.go:103] couldn't get config for req: {server}, error: could not fetch pool. err: machineconfigpools.machineconfiguration.openshift.io "$nameMCP" not found
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.