Red Hat OpenShift cert-manager operator 1.18.x version is incorrectly installed on OpenShift 4.17 or older clusters

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4.14, 4.15, 4.16
  • Cert-manager operator (OpenShift cert-manager operator, e.g. from redhat-operators / openshift-cert-manager-operator) at an unsupported version (e.g. 1.18.x) or stuck at 1.17.0 on 4.14/4.15 after a failed 1.18.x upgrade.

Issue

  • During the incident window (see Red Hat Operator has version higher than the cluster version), clusters on OCP 4.12 - 4.17 could receive 4.18 catalog content. Clusters with automatic install plan approval may have upgraded the cert-manager operator to a version from the 4.18 catalog (e.g. 1.18.x).
  • Operator 1.18.x is not supported on OpenShift 4.14, 4.15, or 4.16. Those clusters are in an unsupported state and do not receive security updates for that operator version.
  • On 4.14 and 4.15, upgrade to 1.18.x may fail with RequirementsNotMet (OpenAPI v3 schema validation rule cost exceeds cluster budget); the operator may then remain at 1.17.0 instead of rolling out 1.18.0.
  • On 4.16, upgrading to 1.18.x may succeed, but 1.18.x is still unsupported on 4.16.

Resolution

Downgrade the cert-manager operator to a supported version for your OCP release, here a table with the versions supported for each Openshift Container Platform (OCP) release:


| OCP version | Target operator | Channel        |
-|-|-
| 4.14        | 1.16.2          | stable-v1.16   |
| 4.15        | 1.16.2          | stable-v1.16   |
| 4.16        | 1.17.0 (default)| stable-v1.17   |

To do so is recommended that first try the Scripted automated recovery, if it fails follow the Manual recovery process.

Prerequisites

  • Back up cert-manager-related resources and fix any breaking operand API usage (e.g. signatureAlgorithm, JKS/PKCS12 only password) before changing the operator version:
$ oc get certmanagers.operator.openshift.io -o yaml > certmanagers.backup

$ oc get certificates.cert-manager.io -A -o yaml > certificates.backup

$ oc get issuers.cert-manager.io -A -o yaml > issuers.backup

$ oc get clusterissuers.cert-manager.io -o yaml > clusterissuers.backup

$ oc get certificaterequests.cert-manager.io -A -o yaml > certificaterequests.backup

This backup can be used to recreate any configuration in case that any issue appears later.

  • Run a pre-check with the script that is attached to this note:
$ check-downgrade-1.18-to-1.16.sh pre-check

The precheck will give a list of the CRs that need to be modified prior to install do the downgrade. To edit the CRs you can use the ‘oc edit’ command.

Automatic Procedure using script

The attached script can be used to downgrade the operator, It have a few subcommands that need to be run in order, operator namespace is configurable via OPERATOR_NAMESPACE (default to cert-manager-operator) and need to be changed only if you don’t use the defaults.

1- Run a pre-check if that was not done yet, refer to the Prerequisites section on the top of the document to do so.

2- Run the downgrade-plan:

$ check-downgrade-1.18-to-1.16.sh downgrade-plan

This will show what is going to be done by the script and it needs to be reviewed. Once it gets agreed upon, go to the next point.

3- Run the downgrade-operator to effectively proceed with the downgrade:

$ check-downgrade-1.18-to-1.16.sh downgrade-operator

Note all the messages you get as output for reference.

5- Run the post-check to confirm that the downgrade was successful and that the configuration is correct

$ check-downgrade-1.18-to-1.16.sh post-check

Manual procedure

NOTE: Run this procedure only if the automatic/script method fail

Choose to do this using the cluster web interface or the oc command line.

Using web interface:

NOTE: If you don’t have access to the cluster web interface follow the oc command line procedure on the next section.

  • Run a pre-check if that was not done yet, refer to the Prerequisites section on the top of the document to do so.
  • Uninstall the operator from the ‘Installed Operators’ section in the web console. Do not check the “delete the operand resources”.
    • Do not delete operand resources in the cert-manager namespace (Certificates, Issuers, ClusterIssuers, etc.) so that data is preserved.
  • Before reinstalling: Remove operatorframework.io/installed-alongside* annotations from the CRDs to avoid ConstraintsNotSatisfiable (ownership conflict) on reinstall:
$ for crd in certmanagers.operator.openshift.io orders.acme.cert-manager.io issuers.cert-manager.io clusterissuers.cert-manager.io challenges.acme.cert-manager.io certificates.cert-manager.io certificaterequests.cert-manager.io istiocsrs.operator.openshift.io; do
  keys=$(oc get crd "$crd" -o json 2>/dev/null | jq -r '
    (.metadata.annotations // {} | keys[]) as $k | select($k | startswith("operatorframework.io/installed-alongside")) | $k
  ')
  for key in $keys; do
    [[ -n "$key" ]] && oc annotate crd "$crd" "$key-" --overwrite
  done
done
  • Reinstall the operator from the correct channel for your OCP version (see table above) with Install Plan Approval set to Manual, then approve the InstallPlan.

Using oc command line:

  • Run a pre-check if that was not done yet, refer to the Prerequisites section on the top of the document to do so.
  • Uninstall the current version:
    • Delete the Subscription, then delete all CSVs in the operator namespace (default cert-manager-operator), do not delete the other objects on the namespace. To do this please check our documentation for OCP4.14, OCP4.15 and OCP4.16.
  • Before reinstalling: Remove operatorframework.io/installed-alongside* annotations from the CRDs to avoid ConstraintsNotSatisfiable (ownership conflict) on reinstall:
$ for crd in certmanagers.operator.openshift.io orders.acme.cert-manager.io issuers.cert-manager.io clusterissuers.cert-manager.io challenges.acme.cert-manager.io certificates.cert-manager.io certificaterequests.cert-manager.io istiocsrs.operator.openshift.io; do
  keys=$(oc get crd "$crd" -o json 2>/dev/null | jq -r '
    (.metadata.annotations // {} | keys[]) as $k | select($k | startswith("operatorframework.io/installed-alongside")) | $k
  ')
  for key in $keys; do
    [[ -n "$key" ]] && oc annotate crd "$crd" "$key-" --overwrite
  done
done
  • Then create a new Subscription with the target channel and Manual approval and approve the InstallPlan. You can follow our documentation for OCP4.14, OCP4.15 and OCP4.16
  • If the operand controller does not get the leader lease, delete the lease in the cert-manager namespace, to do so:
oc delete lease.coordination.k8s.io cert-manager-controller -n cert-manager
  • Verify operator and operand pods are Running and Ready, you should see something similar to:
$ oc get pods -n cert-manager-operator
NAME                                                            READY   STATUS    RESTARTS   AGE
cert-manager-operator-controller-manager-6bb7896444-llwdn   1/1     Running   0          73s

Root Cause

Refer to Red Hat Operator has version higher than the cluster version

Diagnostic Steps

Use these to confirm the cluster is affected (wrong cert-manager operator version for the OCP release or stuck after a failed 1.18.x upgrade).

  1. OpenShift version Check the cluster’s OCP version (must be 4.14, 4.15, or 4.16 for this article):
$ oc get clusterversion version -o jsonpath='{.status.desired.version}'
  • Or in the wen: Administration → Cluster settings → Overview and note the Version.
  1. Cert-manager operator Subscription and channel
  • See which Subscription exists and which channel/CSV it’s using (replace namespace if different):
$ oc get subscriptions.operators.coreos.com -n cert-manager-operator -o wide
$ oc get subscriptions.operators.coreos.com openshift-cert-manager-operator -n cert-manager-operator -o yaml
  • From the YAML, note:
    • spec.channel (e.g. stable-v1, stable-v1.16, stable-v1.17)
    • spec.installPlanApproval (Automatic or Manual)
    • status.installedCSV or status.currentCSV (e.g. cert-manager-operator.v1.18.0, cert-manager-operator.v1.17.0).
  1. Installed operator (CSV) version
  • Confirm the installed cert-manager operator version and CSV phase:
$ oc get csv -n cert-manager-operator
$ oc get csv -n cert-manager-operator -o yaml | grep -E "name:|phase:|version:"
  • If the installed CSV is 1.18.x on a 4.14, 4.15, or 4.16 cluster, the cluster is in an unsupported state for this article.
  • If the CSV is 1.17.0 on 4.14 or 4.15 and the channel is a 1.18 channel, the upgrade to 1.18.x may have failed (e.g. RequirementsNotMet); the cluster is still in an unsupported or at-risk state.
  1. Failed upgrade (RequirementsNotMet) on 4.14 / 4.15
  • If the Subscription points to a 1.18 channel but the installed CSV is still 1.17.0, check the CSV status for resolution/install failures:
$ oc get csv -n cert-manager-operator -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\t"}{.status.message}{"\n"}{end}'
$ oc describe csv -n cert-manager-operator
  • Look for conditions or messages mentioning RequirementsNotMet or OpenAPI, x-kubernetes-validations, estimated rule cost exceeds budget (e.g. 78.7x or 7.9x). That indicates the 1.18.x CSV could not be installed due to validation cost limits.
  1. Operand (cert-manager) workload
  • Check that the operand is running in the cert-manager namespace:
$ oc get pods -n cert-manager  oc get deployment -n cert-manager
  • Note any pods not Running / Ready or deployments not Available; these may need to be fixed or verified after the operator downgrade.
  1. Catalog source (for resolution issues)
  • If the Subscription fails to resolve (e.g. ResolutionFailed / ConstraintsNotSatisfiable), see which catalog the Subscription uses and whether the package is present:
$ oc get subscriptions.operators.coreos.com openshift-cert-manager-operator -n cert-manager-operator -o jsonpath='{.spec.source}{" "}{.spec.sourceNamespace}{"\n"}'
$ oc get packagemanifests openshift-cert-manager-operator -o yaml
  • Confirm the catalog in spec.source / spec.sourceNamespace lists the target channel (e.g. stable-v1.16 or stable-v1.17) and the expected CSV name.
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.