Upgrade to OpenShift 4.18 fails due to OLM operator scheduling issue

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.18
  • Operator Lifecycle Manager (OLM)

Issue

  • OpenShift Cluster Upgrade to version 4.18 is failing due to an OLM Cluster Operator that is unable to schedule during the upgrade process:

    Warning   FailedScheduling    pod/cluster-olm-operator-xxxx    skip schedule deleting pod: openshift-cluster-olm-operator/cluster-olm-operator-xxxx
    

Resolution

This issue has been reported to Red Hat Engineering. It was tracked in This content is not included.OCPBUGS-48478, and fixed in OpenShift 4.18.12 by errata RHSA-2025:4427.

Workaround for versions that does not include the fix

  1. Kindly add below annotation to the openshift-cluster-olm-operator namespace:

    $ oc annotate namespace openshift-cluster-olm-operator openshift.io/node-selector=""
    
  2. Restart the cluster-olm-operator pod which is present in the openshift-cluster-olm-operator namespace:

    $  oc delete pod cluster-olm-operator-xxxx -n openshift-cluster-olm-operator
    

Root Cause

The namespace openshift-cluster-olm-operator do not have the openshift.io/node-selector: "" annotation.

Diagnostic Steps

  • In the pod yaml, the below error message is shown:

    $ oc get pod cluster-olm-operator-xxxx -oyaml
    
        message: '0/13 nodes are available: 13 node(s) didn''t match Pod''s node affinity/selector.
          preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling.'
        reason: Unschedulable
        status: "False"
        type: PodScheduled
      phase: Pending
    
  • Also, In the namespace event there is similar error message:

    $ oc get events -n openshift-cluster-olm-operator 
    13m         Warning   FailedScheduling    pod/cluster-olm-operator-xxxx    skip schedule deleting pod: openshift-cluster-olm-operator/cluster-olm-operator-xxxx
    13m         Normal    SuccessfulCreate    replicaset/cluster-olm-operator-xxxx  Created pod: cluster-olm-operator-xxxx
    13m         Normal    SuccessfulDelete    replicaset/cluster-olm-operator-xxx   Deleted pod: cluster-olm-operator-xxxx
    4h29m       Warning   FailedScheduling    pod/cluster-olm-operator-xxx     0/13 nodes are available: 13 node(s) didn't match Pod's node affinity/selector. preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling
    
  • In the deployment yaml check for the ReplicaSet timed out error message:

    $ oc get deployment/cluster-olm-operator -oyaml
    
        message: ReplicaSet "cluster-olm-operator-xxx" has timed out progressing.
        reason: ProgressDeadlineExceeded
        status: "False"
        type: Progressing
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.