Images fail to be pushed/pulled to/from Quay due to certificate SSL error which causes a 502 to be reported

Solution Verified - Updated 26 Dec 2024

Environment

Red Hat Quay (Quay)
- 3.7
Red Hat OpenShift Container Platform (RHOCP)
- 4

Red Hat OpenShift Data Foundation

Issue

Image push to Quay is failing with a 502 Bad Gateway status

$ podman push registry.example.com/project-name/imagename

Getting image source signatures
Copying blob 33e20b7ab3f3 [--------------------------------------] 8.0b / 20.0KiB
Copying blob 4234c9bfd6aa [--------------------------------------] 8.0b / 55.8MiB
Copying blob 9f9118003e6z [--------------------------------------] 8.0b / 183.4MiB
Copying blob j49dc9259670 [--------------------------------------] 8.0b / 216.6MiB
Error: writing blob: initiating layer upload to /v2/project-name/imagename/blobs/uploads/ in registry.example.com: received unexpected HTTP status: 502 Bad Gateway

Resolution

Note: The two procedures outlined here should functionally be identical, any one of them should resolve the problem described in this article. Applying both of the solutions will cause the initial config bundle to have 2 separate certificates for Noobaa which is not in itself an error and will mean that both certificates will be added to Quay certificate store on next container startup.

Using the OpenShift console

Download the new certificate chain for Noobaa endpoint:

oc exec -it <quay-pod-name> -- openssl s_client -connect s3.openshift-storage.svc.cluster.local:443 -showcerts 2>/dev/null </dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' >> extra_ca_certs_noobaa.crt

Replace the quay-pod-name here with any Quay pod name.

Find the custom config bundle secret name that the operator is using to deploy Quay:
```
oc get quayregistry name-of-registry -o yaml | grep -i configbundlesecret
```
Open OpenShift console and locate the namespace where Quay is deployed. Click on Workloads -> Secrets on the left side and find the custom config bundle secret. Open the secret and set it to editing mode by clicking Actions -> Edit on the left side.
Scroll down to the end of the file and create a new key named extra_ca_cert_noobaa.crt. Paste the content of the file extra_ca_cert_noobaa.crt we created earlier inside the secret.
Save and let the operator reconcile the deplyoment. If reconcilation doesn't happen immediately, delete the Quay operator pod name and let it restart:
```
oc get pods -n openshift-operators
oc delete pod quay-operator-xxxxx-xxxxxxxx -n openshift-operators
```
Replace the -n openshift-operators here with another namespace if haven't installed the operator in its default location.

Using the command line interface

Grab the new server signer certificate from the cluster store:

oc get secret signing-key -n openshift-service-ca -o json | jq -r '.data."tls.crt"' ; echo
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1...

Check the config bundle secret name by inspecting the QuayRegistry custom resource:

$ oc get quayregistry quay -o yaml | grep -i configbundle
  configBundleSecret: quay-quay-config-bundle-8nf6x

Check all keys in that specific config bundle secret, there should be a key named extra_ca_cert_service-ca.crt:

oc get secret quay-quay-config-bundle-8nf6x -o json | jq '.data' | cut -d ':' -f1
{
   "config.yaml"
   "extra_ca_cert_ca-bundle.crt"
   "extra_ca_cert_service-ca.crt"
   "ocp-cluster-wildcard.cert"
}

Patch the secret with the new server signer certificate:
```
oc patch secret quay-quay-config-bundle-8nf6x --type='json' -p='[{"op":"replace", "path":"/data/extra_ca_cert_service-ca.crt", "value": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1..."}]'
```
The operator should automatically reconcile the deployment. If reconcilation doesn't happen immediately, delete the Quay operator pod name and let it restart:
```
oc get pods -n openshift-operators
oc delete pod quay-operator-xxxxx-xxxxxxxx -n openshift-operators
```
Replace the -n openshift-operatrors here with another namespace if haven't installed the operator in its default location.

Root Cause

The issue is caused by either the Noobaa certificate rotation or the service signing root CA rotation on the whole cluster. Although the operator should cover that scenario and should update certificates, there's a bug that is currently preventing it from doing so. This bug is tracked in the following JIRA:

This content is not included.[PROJQUAY-5174] Quay Operator doesn't trust internal service CA when it is rotated

At this time, the only workaround is to manually add the new certificate chain to Quay's deployment after it has rotated.

Diagnostic Steps

We observe the following error in Quay logs (with debugging enabled):

2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |   File "/usr/local/lib/python3.9/site-packages/botocore/retryhandler.py", line 233, in __call__
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |     return self._check_caught_exception(
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |   File "/usr/local/lib/python3.9/site-packages/botocore/retryhandler.py", line 376, in _check_caught_exception
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |     raise caught_exception
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |   File "/usr/local/lib/python3.9/site-packages/botocore/endpoint.py", line 249, in _do_get_response
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |     http_response = self._send(request)
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |   File "/usr/local/lib/python3.9/site-packages/botocore/endpoint.py", line 321, in _send
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |     return self.http_session.send(request)
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |   File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 466, in send
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout |     raise SSLError(endpoint_url=request.url, error=e)
2023-06-14T20:14:35.568255077Z gunicorn-registry stdout | botocore.exceptions.SSLError: SSL validation failed for https://s3.openshift-storage.svc.cluster.local:443/quay-datastore-2de49366-4ce7-41d1-a4e4-243f047d9eb5 [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)
2023-06-14T20:14:35.569267414Z nginx stdout | 2023/06/14 20:14:35 [error] 94#0: *173 upstream prematurely closed connection while reading response header from upstream, client: 10.155.4.1, server: _, request: "POST /v2/{NAMESPACE}/{REPOSITORY}/blobs/uploads/ HTTP/1.1", upstream: "http://unix:/tmp/gunicorn_registry.sock:/v2/{NAMESPACE}/{REPOSITORY}/blobs/uploads/", host: "QUAY_HOSTNAME"
2023-06-14T20:14:35.572281659Z nginx stdout | 10.155.4.1 (-) - - [14/Jun/2023:20:14:35 +0000] "POST /v2/{NAMESPACE}/{REPOSITORY}/blobs/uploads/ HTTP/1.1" 502 337 "-" "containers/5.22.1 (github.com/containers/image)" (7.269 1403 7.266 : 0.003)

SBR

Shift Container Registry

Product(s)

Red Hat Quay

Components

quay

Category

Configure

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.