machine-config-nodes-crd-cleanup pod in pending state during upgrade from 4.18 to 4.19 in RHOCP4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • Upgrades from 4.18 to 4.19.12 or 4.19.13

Issue

  • RHOCP upgrade to 4.19 is stuck due to machine-config-nodes-crd-cleanup pod in openshift-machine-config-operator namespace stuck in pending state:

    $ oc get clusterversion -o yaml 
    [...]
        - lastTransitionTime: "2025-09-29T09:30:03Z"
          message: 'Could not update customresourcedefinition "machineconfignodes.machineconfiguration.openshift.io"
            (785 of 924): the object is invalid, possibly due to local cluster configuration'
          reason: UpdatePayloadResourceInvalid
          status: "True"
    [...]
    

Resolution

The issue has been identified as a bug by Red Hat engineering team and is being actively worked upon. It can be tracked by:

Target Minor ReleaseBugFixed VersionErrata
4.20This content is not included.OCPBUGS-620734.20.0RHSA-2025:9562
4.19This content is not included.OCPBUGS-621144.19.14RHBA-2025:16693

In addition to the above, when This content is not included.OCPBUGS-62321 is fixed, control plane nodes will be automatically labeled with the required label when updating to an OpenShift 4.18 version including this fix.

Workaround

When facing the issue upgrading to any of the affected versions, please add the node-role.kubernetes.io/control-plane label as mentioned below to all the control plane nodes in order to make sure that the upgrade proceeds as expected:

$ oc label node -l node-role.kubernetes.io/master node-role.kubernetes.io/control-plane=

Root Cause

This is observed with clusters that were installed prior to RHOCP version 4.12, where the nodes were not having the role Control Planeas a result the pod machine-config-nodes-crd-cleanup goes in pending state because the NodeSelector on the pod is using the label Control Plane. There is additional information about this in inconsistency of node-role between newly created vs. long running OpenShift 4 clusters.

Diagnostic Steps

  • Verify the state of the pod machine-config-nodes-crd-cleanup in openshift-machine-config-operator namespace:

        $ oc get pods -o wide | grep -i machine-config-nodes-crd-cleanup
    
        machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx                   0/1     Pending   0          3d    <none>         <none>                                       <none>           <none>
    
        $ oc get events | grep -i machine-config-nodes-crd-cleanup
    
         3h          Warning   FailedScheduling            pod/machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx                   0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
        3h          Warning   FailedScheduling            pod/machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx                   0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
        3h          Warning   FailedScheduling            pod/machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx                   0/6 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
        4m          Warning   FailedScheduling            pod/machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx                   0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
    
  • Check the NodeSelector for the pod in Pending State:

    $ oc get pods machine-config-nodes-crd-cleanup-xxxxxxxx-xxxxx
    [...]
          imagePullSecrets:
          - name: machine-config-operator-dockercfg-xxxxx
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""       ## <- This label is missing on the Master / Control Plane Nodes
          preemptionPolicy: PreemptLowerPriority
    [...]
    
  • Check the labels on the Master / Control Plane Nodes:

    $ oc get nodes -l node-role.kubernetes.io/master -o yaml | grep -i control-plane
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.