How to renew/rotate the certificate for cluster operator operator-lifecycle-manager-packageserver in RHOCP4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • $ oc get clusterversion shows the below error message.

    Error while reconciling 4.8.39: the cluster operator operator-lifecycle-manager-packageserver has not yet successfully rolled out.
    
  • The cluster operator operator-lifecycle-manager-packageserver is not available

    $ oc get co operator-lifecycle-manager-packageserver -o  yaml
    ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceResourceIssue, message: serving cert not active
    

Resolution

This issue has been reported to Red Hat engineering. It is being tracked by the following bugs:

Target Minor ReleaseBugFixed VersionErrata
4.16This content is not included.OCPBUGS-361384.16.2RHSA-2024:4316
4.15This content is not included.OCPBUGS-368134.15.23RHSA-2024:4699
4.14This content is not included.OCPBUGS-369494.14.35RHSA-2024:5433
4.13This content is not included.OCPBUGS-382544.13.51RHSA-2024:6811
4.12This content is not included.OCPBUGS-418814.12.67RHSA-2024:7590

For more information, please open a This content is not included.new support case with Red Hat Support.

Workaround

  1. Backup the service secret associated with cluster operator operator-lifecycle-manager-packageserver:

    $ oc get secret catalog-operator-serving-cert olm-operator-serving-cert packageserver-service-cert -n openshift-operator-lifecycle-manager -o yaml > olm-certs.yaml
    
  2. Recreate the service secret associated with cluster operator operator-lifecycle-manager-packageserver.

    $ oc delete secret catalog-operator-serving-cert olm-operator-serving-cert packageserver-service-cert -n openshift-operator-lifecycle-manager
    
  3. Recreate the pods associated with cluster operator operator-lifecycle-manager-packageserver.

    $ oc delete pod -l 'app in (catalog-operator, olm-operator, packageserver, package-server-manager)' -n openshift-operator-lifecycle-manager
    
  4. Check if all pods came back up.

    $ oc get pods -n openshift-operator-lifecycle-manager  
    
  5. Backup the existing packages API service

    $ oc get apiservice v1.packages.operators.coreos.com -o yaml > packages-Api.yaml
    
  6. Delete the existing packages API service.

    $ oc delete apiservice v1.packages.operators.coreos.com
    
  7. Check if the certificate is renewed.

    $ oc get apiservice v1.packages.operators.coreos.com -o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -noout -text
    

Root Cause

The operator-lifecycle-manager-packageserver certificate is not automatically rotated.

Diagnostic Steps

The OLM packageserver apiservice is not working due to the error

$ oc project  openshift-kube-apiserver
$ oc logs kube-apiserver-example.com -c kube-apiserver
loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable

Verify the catalog-operator-serving-cert, olm-operator-serving-cert and packageserver-service-cert certificates are expired:

$ oc get secret catalog-operator-serving-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
$ oc get secret olm-operator-serving-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
$ oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates

Note that it has been observed in a case that due to this issue OVN-Kubernetes pods start to consume a lot of CPU resources. More specifically the ovnkube-controller container of the ovnkube-node pods. This is because the endpointslices of the packageserver pods are flapping and OVN is stuck in a loop recalculating OVN flows each time the endpointslice is being added and removed.

Related logs on how it looks like this behavior below are from kube-controller-manager and ovnkube-controller:

Kube-controller-manager

2025-06-04T07:15:58.712323897+00:00 stderr F I0604 07:15:58.712261       1 garbagecollector.go:533] "Processing item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-operator-lifecycle-manager, name: packageserver-service-t5cnn, uid: 90a691a1-1832-4b54-a34c-361ac2a5d37b]" virtual=false
2025-06-04T07:15:58.715843409+00:00 stderr F I0604 07:15:58.715814       1 garbagecollector.go:672] "Deleting item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-operator-lifecycle-manager, name: packageserver-service-t5cnn, uid: 90a691a1-1832-4b54-a34c-361ac2a5d37b]" propagationPolicy=Backgroun

Ovnkube-controller

2025-06-04T07:15:39.664381086Z I0604 07:15:39.664317   10092 services_controller.go:582] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2025-06-04T07:15:39.963172122Z I0604 07:15:39.963114   10092 services_controller.go:582] Deleting service openshift-operator-lifecycle-manager/packageserver-service

These components are flooded with these logs every some seconds.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.