Degraded machine-config Cluster Operator due to MachineConfigPool being paused in OpenShift 4

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform (RHOCP)
    • 4
  • Machine Config Operator (MCO)
  • MachineConfigPool

Issue

  • MachineConfigPools are paused, preventing the Machine Config Operator to push out updates in OpenShift 4.

  • The machine-config ClusterOperator has messages like:

    timed out waiting for the condition during syncRequiredMachineConfigPools:
    
    pool master has not progressed to latest configuration: controller version mismatch
    

Resolution

Check the value of the paused field in the MachineConfigPools as shown in the Diagnostic Steps, and change them to false:

$ oc patch mcp [mcp_name] --type=merge -p '{"spec": {"paused": false}}'

There is an RFE (This content is not included.RFE-1993) to alert before starting an upgrade if the master MCP is paused.

Root Cause

The MachineConfigPools can not be applied with newly rendered configs because they were paused: true. This caused the Content from github.com is not included.controller version check failed because the machine config pools are still using the Content from github.com is not included.old rendered configs, which will never be generated by the newly installed Machine Config Operator.

This change is generally updated by an administrator; possibly to avoid any updates occurring without their knowledge. The cluster would not set this value to true. Refer to Disable autoreboot after a change with the machine-config-operator in OCP 4 for additional information.

Diagnostic Steps

Check the upgrade status:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.9     True        True          13h     Working towards 4.3.26: 84% complete

Check if the machine-config ClusterOperator is available or degraded, and check the status:

$ oc get co
NAME                                      VERSION  AVAILABLE  PROGRESSING  DEGRADED  SINCE
[...]
machine-config                            4.3.9   False      True         True       5h
[...]

$ oc get co machine-config -o yaml
[...]
message: ‘Unable to apply 4.3.26: timed out waiting for the condition during syncRequiredMachineConfigPools:
      pool master has not progressed to latest configuration: controller version mismatch
[...]
lastSyncError: ‘pool master has not progressed to latest configuration: controller
      version mismatch
[...]

Check if the desired and current rendered configuration MachineConfig for each MachineConfigPool is the same, or if anyone has different desired and current configuration:

$ oc get mcp -o custom-columns=NAME:metadata.name,DESIRED:spec.configuration.name,CONFIG-STATUS:status.configuration.name
[...]

Check the MachineConfigPool referenced in machine-config Cluster Operator messages, to verify if any are set to paused:

$ oc get mcp [mcp_name] --template='{{.spec.paused}}'
$ oc get mcp [mcp_name] -o yaml
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.