Image pull is slow in Quay
Environment
- Red Hat Quay
- 3.x
- podman
- skopeo
Issue
- Pulling images from the Quay registry is extremely slow
- While pulling the image from the Quay we are experiencing slowness.
- Some nodes within the OCP cluster can pull an image within 30s, while others need 17 mins.
- When performing $ skopeo copy on the Quay instance, the performance is drastically slower.
Troubleshooting Steps
- Podman logs with the date and time command output attached to know exactly when the pull occurred.
$ date; podman pull quay-registry.example.com/openshift/release:tags --tls-verify=false; date
Fri Dec 22 20:09:45 CST 2023
Trying to pull quay-registry.example.com/openshift/release:tags...
Getting image source signatures
Copying blob 6710e351bf02 done
Copying blob d8190195889e done
Copying blob 97da74cc6d8f done
Copying blob b4b8e80cc41e done
Copying blob 20975753b7ac done
Copying blob 1b55ae1c21ce done
Copying config 18f5b179d4 done
Writing manifest to image destination
18f5b179d492d41c25530de20211c0e9b497fxxxxx
Fri Dec 22 20:15:54 CST 2023
$ time podman push quay-registry.example.com/openshift/release:tags
Getting image source signatures
WARN[0000] Failed, retrying in 1s ... (1/3). Error: trying to reuse blob sha256:53498d66ad83a29fcd7c7bcf4abbcc0def4fc912772aa8a4483b51exxxx at destination: pinging container registry quay-registry.example.com:8443: Get "quay-registry.example.com:8443/v2/": dial tcp 10.130.xx.xx:8443: connect: connection refused
Getting image source signatures
WARN[0001] Failed, retrying in 1s ... (2/3). Error: trying to reuse blob sha256:53498d66ad83a29fcd7c7bcf4abbcc0def4fc912772aa8a4483b51xxxx at destination: pinging container registry quay-registry.example.com:8443: Get "https://quay-registry.example.com:8443/v2/": dial tcp 10.130.xx.xx:8443: connect: connection refused
Getting image source signatures
WARN[0002] Failed, retrying in 1s ... (3/3). Error: trying to reuse blob sha256:53498d66ad83a29fcd7c7bcf4abbcc0def4fc912772aa8a4483b51xxxx at destination: pinging container registry quay-registry.example.com:8443: Get "https://quay-registry.example.com:8443/v2/": dial tcp 10.130.xx.xx:8443: connect: connection refused
Getting image source signatures
Error: trying to reuse blob sha256:53498d66ad83a29fcd7c7bcf4abbcc0def4fc912772aa8a4483b51xxxx at destination: pinging container registry quay-registry.example.com:8443: Get "https://quay-registry.example.com:8443/v2/": dial tcp 10.130.xx.xx:8443: connect: connection refused
real 0m3.053s
user 0m0.060s
sys 0m0.026s
-
Check Quay debug logs from all Quay pods/containers
-
Network configuration: Review the network configuration of the VM. Check if there are any network issues such as high latency, packet loss, or congestion. If there is vpn then the number of hops increases and so does the delay in pulling images.
- Check if there is a high network bandwidth load on the Machine (that exhibits slow pulls). Complete tcpdump output from node where Quay is running during the reproduction of the issue. It is recommended to set the number of Quay replicas to 1. This means turning off HPA by setting the HPA component to managed off and overriding the number of Quay replicas to 1. Delete Quay's main deployment HPA to get the desired number of pods.
spec:
components:
- kind: horizontalpodautoscaler
managed: false
- kind: quay
managed: true
overrides:
replicas: 1
env:
- name: DEBUGLOG
value: "true"
- With the tcpdump specify all the required IP addresses:
- IP address of the host from where the pull is being done
- all service IPs that are involved in the transaction
- IP address of the node where Quay is deployed along with Quay pod IP
-
Storage performance: The performance of the underlying storage system can impact the speed of image pulls from the registry. If the storage system is slow or overloaded, it can lead to slow downloads.
-
Check if Backend storage is tested and supported from Quay supportability matrix
-
Check storage health by executing a shell inside Quay's pod/container:
sh-4.2# curl -k https://localhost:8443/health/instance {"data":{"services":{"auth":true,"database":true,"disk_space":true,"jwtproxy":true,"registry_gunicorn":true,"service_key":true,"verbs_gunicorn":true,"web_gunicorn":true}},"status_code":500} sh-4.2# curl -k https://localhost:8443/health/endtoend {"data":{"services":{"auth":true,"database":true,"redis":true,"storage":true}},"status_code":500}- Perform the following test to measure the speed of transfer from the storage to the machine in question. Push a file from the Machine (that exhibits slow pulls) to the S3 storage bucket using the below commands:
$ subscription-manager repos --enable=rhel-8-for-x86_64-highavailability-rpms $ yum install awscli // Install a binary called `aws` to perform the test. $ dd if=/dev/urandom of=/tmp/random-file count=10 bs=50M iflag=fullblock // Create a file of 500 MB in size filled with random data in the /tmp directory and push this file to the bucket. $ export AWS_ACCESS_KEY_ID=xxxxxx $ export AWS_SECRET_ACCESS_KEY=xxxxxx $ time { aws s3 cp --no-verify-ssl --endpoint-url https://<quay-hostname> s3://<bucket-name>/<file-name> /path/on/VM; } . . . download: s3://upiquay/random-file to ../tmp/random-file real 0m4.577s user 0m2.841s sys 0m2.230s -
-
Proxy or firewall: Check if there are any proxies or firewalls between the client pulling the images and the Quay registry. These can introduce additional latency or restrict bandwidth. NOTE: If FEATURE_PROXY_STORAGE is not used, then Quay provides a direct download link to the client(podman/skopeo/docker). At this point, traffic does not go through Quay, the client pings the underlying storage and requests the image layer.
-
Size of the Image: Check if the image is large or if any particular blob/layer of the image is bigger. This might contribute to the overall time in pulling the image
- Get the image manifest:
$ curl -X GET https://<registry-url>/v2/<image-name>/manifests/<tag> -H "Accept: application/vnd.docker.distribution.manifest.v2+json"- Extract the image size from the response. The response will be a JSON document. Look for the config field, and within that field find a size property. The value of size represents the size of the image in bytes.
$ curl -X GET https://<registry-url>/v2/<image-name>/manifests/<tag> -H "Accept: application/vnd.docker.distribution.manifest.v2+json" | jq '.config.size' -
Antivirus software on the Machine trying to operate: Check if there is antivirus software that interacts with image pull and slows it.
-
Resource allocation: Check the resource allocation for the machine hosting the Quay pod/container. Ensure it has sufficient CPU, memory, and network resources allocated to handle the expected load. If the machine is under-provisioned, it can result in slow performance.