Image pull from Quay fails due to incorrect backend storage hostname

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform
    • 4.x
  • Red Hat Quay Operator
    • 3.x
  • Unmanaged Object Storage

Issue

  • Unable to pull images from Red Hat Quay registry. The RHOCP cluster is being deployed through ACM. The bootstrap node displays this error:

    Mar 23 22:34:59 localhost systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
    Mar 23 22:34:59 localhost systemd[1]: bootkube.service: Failed with result 'exit-code'.
    Mar 23 22:35:04 localhost systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
    Mar 23 22:35:04 localhost systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 7.
    Mar 23 22:35:04 localhost systemd[1]: Stopped Bootstrap a Kubernetes cluster.
    Mar 23 22:35:04 localhost systemd[1]: Started Bootstrap a Kubernetes cluster.
    Mar 23 22:35:17 localhost bootkube.sh[21422]: Rendering CEO Manifests...
    Mar 23 22:35:33 localhost bootkube.sh[21422]: Error: Error parsing image configuration: Error fetching blob: invalid status code from registry 502 (Bad Gateway)
    
  • When manually pulling an image from the Quay, where used radosgw or noobaa is unmanaged object storage, returns the error parsing image configuration 502 (Bad Gateway):

    $ podman pull quay.apps.xxx.com/repository/alpine:latest
    Trying to pull quay.apps.xxx.com/repository/alpine:latest...
    Error: parsing image configuration: Error fetching blob: invalid status code from registry 502 (Bad Gateway)
    

Resolution

  • In the Quayconfig.yaml file, review and rectify hostname from DISTRIBUTED_STORAGE_CONFIG section. Make use of Fully Qualified Domain Name for the backend s3 storage.
  1. For the CEPH radosGWStorage:
DISTRIBUTED_STORAGE_CONFIG:
  radosGWStorage:
    - RadosGWStorage
    - access_key: xxx
      secret_key: xxx
      bucket_name: xxx      
      hostname: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc.cluster.local
      is_secure: true
      port: 443
      storage_path: /datastorage/registry
DISTRIBUTED_STORAGE_PREFERENCE:
    - radosGWStorage
  1. For the Unmanaged Noobaa storage:
DISTRIBUTED_STORAGE_CONFIG:
  default:
    - RHOCSStorage
    - access_key: xxx
      bucket_name: xxx
      hostname: s3.openshift-storage.svc.cluster.local
      is_secure: true
      port: "443"
      secret_key: xxx
      storage_path: /datastorage/registry
DISTRIBUTED_STORAGE_DEFAULT_LOCATIONS: []
DISTRIBUTED_STORAGE_PREFERENCE:
  - default

Root Cause

  • s3.openshift-storage.svc is usually not resolvable by Quay container but full FQDN s3.openshift-storage.svc.cluster.local is. dnsmasq inside the container is funky in its resolution. When an internal endpoint is being used, FEATURE_PROXY_STORAGE must be turned on, or else the docker/podman client won't know where the images are pulled from.

Diagnostic Steps

  • After pushing an image to the Quay Registry, it's possible to check the following similar logs in the quay-registry-app pod:

    nginx stdout | 2023/02/08 20:38:20 [error] 96#0: *122835 s3.openshift-storage.svc could not be resolved (3: Host not found) while sending to the client, client: 172.30.2.1, server: , request: "GET /_storage_proxy/ZXlKMGVYQ...yRVU1eVlkZw==/https/s3.openshift-storage.svc:443/quay-registry-datastore-53c86d7e-e18b-415f-8669-63ac360482fc/datastorage/registry/sha256/ac/ac23ce89fd89c309b302343da20619b7b5bb7d962a474135cf679433a16e291b?AWSAccessKeyId=xxx
    
  • Or for RADOS Gateway:

    nginx stdout | 2023/03/15 18:01:49 [error] 85#0: *3754 rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc could not be resolved (3: Host not found) while sending to client, client: 172.18.0.2, server: , request: "GET /_storage_proxy/ZXlKMGVYQ...QwbzFqRW5R/https/rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc:443/quay-registry-datastore-f4b5bf3a-66cb-4a6b-b69b-b6a95115aa2c/datastorage/registry/sha256/fb/fb8c19999d757e54126cc1840b6cdd439956383d5d01a0039234c963170
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.