Upgrade to OpenShift 4.18 fails due to OLM operator scheduling issue
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.18
- Operator Lifecycle Manager (OLM)
Issue
-
OpenShift Cluster Upgrade to version
4.18is failing due to an OLM Cluster Operator that is unable to schedule during the upgrade process:Warning FailedScheduling pod/cluster-olm-operator-xxxx skip schedule deleting pod: openshift-cluster-olm-operator/cluster-olm-operator-xxxx
Resolution
This issue has been reported to Red Hat Engineering. It was tracked in This content is not included.OCPBUGS-48478, and fixed in OpenShift 4.18.12 by errata RHSA-2025:4427.
Workaround for versions that does not include the fix
-
Kindly add below annotation to the
openshift-cluster-olm-operatornamespace:$ oc annotate namespace openshift-cluster-olm-operator openshift.io/node-selector="" -
Restart the
cluster-olm-operatorpod which is present in theopenshift-cluster-olm-operatornamespace:$ oc delete pod cluster-olm-operator-xxxx -n openshift-cluster-olm-operator
Root Cause
The namespace openshift-cluster-olm-operator do not have the openshift.io/node-selector: "" annotation.
Diagnostic Steps
-
In the pod yaml, the below error message is shown:
$ oc get pod cluster-olm-operator-xxxx -oyaml message: '0/13 nodes are available: 13 node(s) didn''t match Pod''s node affinity/selector. preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling.' reason: Unschedulable status: "False" type: PodScheduled phase: Pending -
Also, In the namespace event there is similar error message:
$ oc get events -n openshift-cluster-olm-operator 13m Warning FailedScheduling pod/cluster-olm-operator-xxxx skip schedule deleting pod: openshift-cluster-olm-operator/cluster-olm-operator-xxxx 13m Normal SuccessfulCreate replicaset/cluster-olm-operator-xxxx Created pod: cluster-olm-operator-xxxx 13m Normal SuccessfulDelete replicaset/cluster-olm-operator-xxx Deleted pod: cluster-olm-operator-xxxx 4h29m Warning FailedScheduling pod/cluster-olm-operator-xxx 0/13 nodes are available: 13 node(s) didn't match Pod's node affinity/selector. preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling -
In the deployment yaml check for the
ReplicaSettimed out error message:$ oc get deployment/cluster-olm-operator -oyaml message: ReplicaSet "cluster-olm-operator-xxx" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.