Operator-managed Quay deployments fail to fully deploy or update due to resource demands
Environment
- Red Hat Quay operator
- 3.8+
Issue
- Quay operator deploys the following pods with default resource requests as shown below. They can be too large for smaller clusters and may cause issues during rolling updates or even initial rollout
[root@sbhavsar]# oc get pods
NAME READY STATUS RESTARTS AGE
quay-operator.v3.6.2-d88c4f74b-7s8t7 1/1 Running 0 4m22s
subquay-clair-app-79f96d69dc-j7dzh 1/1 Running 0 2m35s
subquay-clair-app-79f96d69dc-n9svj 1/1 Running 0 2m3s
subquay-clair-postgres-cc4fdf4b7-hjv9m 1/1 Running 0 2m51s
subquay-quay-app-766f64b84d-grkqv 1/1 Running 0 2m35s
subquay-quay-app-766f64b84d-m4bps 1/1 Running 0 2m35s
subquay-quay-app-upgrade-wp9vd 0/1 Completed 0 2m44s
subquay-quay-config-editor-6c84649df8-v2zhz 1/1 Running 0 2m35s
subquay-quay-database-78bf9dd579-gjfvm 1/1 Running 0 2m33s
subquay-quay-mirror-b9c7657b6-7tptr 1/1 Running 0 2m11s
subquay-quay-mirror-b9c7657b6-phcfh 1/1 Running 0 2m11s
subquay-quay-postgres-init-lp8fv 0/1 Completed 0 2m36s
subquay-quay-redis-6c65bdc497-hsgfg 1/1 Running 0 3m31s
1.clair-app Requests x 2 (instances):
cpu: 2
memory: 2Gi
2. clair-postgres :
Requests:
cpu: 500m
memory: 2Gi
3. quay.app x 2 (instances) :
Requests:
cpu: 2
memory: 8Gi
4. quay-database :
Requests:
cpu: 500m
memory: 2Gi
5. quay-mirror x 2 (instances):
Requests:
cpu: 500m
memory: 512Mi
6. redis
Requests:
cpu: 500m
memory: 1Gi
- Can one optimize the CPU and memory requests?
- Quay prerequisites are high in terms of CPU/memory requests. One would like to lower this to a custom value. Is it possible? If yes, how?
Resolution
- Disable the
horizontalpodautoscalingcomponents in theQuayRegistryCustom Resource and use the override feature to set replica count to 1 as shown in the Documentation for scaling down a Quay deployment to lower the required resources. Note that a single replica is prone to cause registry outages because the pod may get restarted during updates, Quay configuration updates, node maintenance events or unexpected node downtime. - The resource limit and requests cannot be lowered. The operator does not have that functionality as of now.
- There is a This content is not included.JIRA PROJQUAY-2877 whose goal is to rationalize resource usage to a reasonable level, but it's still a long way from being implemented.
- In the meantime, another JIRA This content is not included.PROJQUAY-3060 is open to able to edit resources.
Root Cause
- Quay is an enterprise registry and consists of many different components. When bootstrapping, Quay starts lots of workers which handle registry operations like
garbage collection,authentication,internet access (Nginx), and many others. Because of that, Quay requires at least 2 vCPUs and at least 4 GB of memory per pod to function properly. - The sizing of other components has been determined to ensure registry availability and performance in all cases
- Additional resources may be required to run the
databaseandClair, assuming Quay is deployed using an operator and all components are set to managed state.
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.