How to renew/rotate the certificate for cluster operator operator-lifecycle-manager-packageserver in RHOCP4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
-
$ oc get clusterversionshows the below error message.Error while reconciling 4.8.39: the cluster operator operator-lifecycle-manager-packageserver has not yet successfully rolled out. -
The cluster operator
operator-lifecycle-manager-packageserveris not available$ oc get co operator-lifecycle-manager-packageserver -o yaml ClusterServiceVersion openshift-operator-lifecycle-manager/packageserver observed in phase Failed with reason: APIServiceResourceIssue, message: serving cert not active
Resolution
This issue has been reported to Red Hat engineering. It is being tracked by the following bugs:
| Target Minor Release | Bug | Fixed Version | Errata |
|---|---|---|---|
| 4.16 | This content is not included.OCPBUGS-36138 | 4.16.2 | RHSA-2024:4316 |
| 4.15 | This content is not included.OCPBUGS-36813 | 4.15.23 | RHSA-2024:4699 |
| 4.14 | This content is not included.OCPBUGS-36949 | 4.14.35 | RHSA-2024:5433 |
| 4.13 | This content is not included.OCPBUGS-38254 | 4.13.51 | RHSA-2024:6811 |
| 4.12 | This content is not included.OCPBUGS-41881 | 4.12.67 | RHSA-2024:7590 |
For more information, please open a This content is not included.new support case with Red Hat Support.
Workaround
-
Backup the service secret associated with cluster operator
operator-lifecycle-manager-packageserver:$ oc get secret catalog-operator-serving-cert olm-operator-serving-cert packageserver-service-cert -n openshift-operator-lifecycle-manager -o yaml > olm-certs.yaml -
Recreate the service secret associated with cluster operator
operator-lifecycle-manager-packageserver.$ oc delete secret catalog-operator-serving-cert olm-operator-serving-cert packageserver-service-cert -n openshift-operator-lifecycle-manager -
Recreate the pods associated with cluster operator
operator-lifecycle-manager-packageserver.$ oc delete pod -l 'app in (catalog-operator, olm-operator, packageserver, package-server-manager)' -n openshift-operator-lifecycle-manager -
Check if all pods came back up.
$ oc get pods -n openshift-operator-lifecycle-manager -
Backup the existing
packagesAPI service$ oc get apiservice v1.packages.operators.coreos.com -o yaml > packages-Api.yaml -
Delete the existing
packagesAPI service.$ oc delete apiservice v1.packages.operators.coreos.com -
Check if the certificate is renewed.
$ oc get apiservice v1.packages.operators.coreos.com -o jsonpath='{.spec.caBundle}' | base64 -d | openssl x509 -noout -text
Root Cause
The operator-lifecycle-manager-packageserver certificate is not automatically rotated.
Diagnostic Steps
The OLM packageserver apiservice is not working due to the error
$ oc project openshift-kube-apiserver
$ oc logs kube-apiserver-example.com -c kube-apiserver
loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
Verify the catalog-operator-serving-cert, olm-operator-serving-cert and packageserver-service-cert certificates are expired:
$ oc get secret catalog-operator-serving-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
$ oc get secret olm-operator-serving-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
$ oc get secret packageserver-service-cert -o json -n openshift-operator-lifecycle-manager | jq -r '.data | .["tls.crt"]' | base64 -d | openssl x509 -noout -dates
Note that it has been observed in a case that due to this issue OVN-Kubernetes pods start to consume a lot of CPU resources. More specifically the ovnkube-controller container of the ovnkube-node pods. This is because the endpointslices of the packageserver pods are flapping and OVN is stuck in a loop recalculating OVN flows each time the endpointslice is being added and removed.
Related logs on how it looks like this behavior below are from kube-controller-manager and ovnkube-controller:
Kube-controller-manager
2025-06-04T07:15:58.712323897+00:00 stderr F I0604 07:15:58.712261 1 garbagecollector.go:533] "Processing item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-operator-lifecycle-manager, name: packageserver-service-t5cnn, uid: 90a691a1-1832-4b54-a34c-361ac2a5d37b]" virtual=false
2025-06-04T07:15:58.715843409+00:00 stderr F I0604 07:15:58.715814 1 garbagecollector.go:672] "Deleting item" item="[discovery.k8s.io/v1/EndpointSlice, namespace: openshift-operator-lifecycle-manager, name: packageserver-service-t5cnn, uid: 90a691a1-1832-4b54-a34c-361ac2a5d37b]" propagationPolicy=Backgroun
Ovnkube-controller
2025-06-04T07:15:39.664381086Z I0604 07:15:39.664317 10092 services_controller.go:582] Deleting service openshift-operator-lifecycle-manager/packageserver-service
2025-06-04T07:15:39.963172122Z I0604 07:15:39.963114 10092 services_controller.go:582] Deleting service openshift-operator-lifecycle-manager/packageserver-service
These components are flooded with these logs every some seconds.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.