clair-postgres pvc is full
Environment
- Red Hat Quay Operator
- 3.8+
Issue
- clair pods are in crashloopbackoff and pod logs show PVC is full.
sfit-registry-clair-app-6cf647c65-l24br 0/1 CrashLoopBackOff 1594 (4m25s ago) 5d15h
sfit-registry-clair-app-6ff4844c45-dkqs4 0/1 CrashLoopBackOff 1594 (4m46s ago) 5d16h
sfit-registry-clair-postgres-5dcd6c4899-w7cxr 0/1 Running 1099 (5m11s ago) 3d23h
$ oc logs dr-clair-postgres
waiting for server to start....2023-12-13 03:40:37.875 GMT [23] FATAL: could not write lock file "postmaster.pid": No space left on device
stopped waiting
pg_ctl: could not start server
Examine the log output.
Resolution
1. Clear rows from a table without removing the table structure. uo_vuln table contains information about update operations on Clair's database. And it's safe to truncate it.
$ oc get pods | grep clair
$ oc rsh clair-pod-name
sh-4.4$ psql \clairdb-name
psql (10.21)
Type "help" for help.
clairdb-name-# TRUNCATE TABLE uo_vuln;
2. Clair regularly updates its own CVE database in search for new definitions. The update frequency is defined by the Content from quay.github.io is not included..matcher.period value in Clair's config.yaml file. If this value does not exist, Clair should use the default value which is 30 minutes. The higher the count the less frequently it will update vuln database
Content from quay.github.io is not included..matcher.update_retention holds the number of updates in the database and garbage collects the rest. On new versions of Clair, the default value (if this field doesn't exist in Clair's config.yaml file) is 2. Setting it to 1 will decrease the growth of this table
...
matcher:
update_retention: 1
period: 12h
...
3. Increase the size of the pv/pvc.
The volume size of the clairpostgres component cannot be overridden. This is a known issue and will be fixed in a future version of Red Hat Quay This content is not included.(PROJQUAY-4301)
4. Redeploy clair as when clairdb is full pod goes into crashloopbackoff. When Clair database is dropped, the information related to already indexed images also gets dropped. To get the security reports back for those images, Quay must send them to Clair once again to be re-indexed. This can be a very lengthy procedure if there are many images stored in the registry. Clair will always recreate the database schema and will populate it with CVE data automatically if it's connected to the internet. But security reports will not be available until the indexing process is completed.
- Add sleep command to clair container through the clair deployment as follows?
$ oc edit deployment <clair-postgres-deployment>
...
spec:
containers:
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
- Once done restart clair postgres pod
$ oc delete pod pod-name
- Downscale Quay operator to 0 pods.
$ oc scale --replicas=0 deployment quay-operator.v3.8.x -n <quay-operator-ns>
- Execute psql shell inside Clair's database
$ oc exec -it <clair-postgres> -- psql
- List databases to see what user was used to create the db.
\l+
- Drop current clair datbase
drop database <clair_database>;
- Create new Clair database with the same user as owner as the previous db had.
e.g
Name | Owner | Encoding | Collate |
-----------+-------+----------+-------------
clair | clair | UTF8 | en_US.utf8 |
create database clair owner clair
- Connect to the new database and create extension "uuid-ossp" on the database.
\c clair;
- create extension uuid-ossp
CREATE EXTENSION "uuid-ossp";
- scale Quay operator back to 1 pod. It will reconcile the whole deployment.
Diagnostic Steps
- Check detailed application logs as follows:
$ oc exec -it <clair-database-pod> -- cat /var/lib/pgsql/data/userdata/log/postgresql-xxx.log > clairpostgres.op
- Check what file under userdata consumes the most space for postgresql
$ oc exec -it <database-pod> /bin/bash
sh-4.4$ psql
postgres=# \c postgres;
You are now connected to database "postgres" as user "postgres".
sh-4.4$ cd /var/lib/pgsql/data/userdata
sh-4.4$ du -sh *
4.0K PG_VERSION
16G base
4.0K current_logfiles
568K global
92K log
4.0K pg_commit_ts
4.0K pg_dynshmem
8.0K pg_hba.conf
4.0K pg_ident.conf
16K pg_logical
28K pg_multixact
4.0K pg_notify
4.0K pg_replslot
4.0K pg_serial
4.0K pg_snapshots
4.0K pg_stat
52K pg_stat_tmp
128K pg_subtrans
4.0K pg_tblspc
4.0K pg_twophase
1.1G pg_wal
48K pg_xact
4.0K postgresql.auto.conf
4.0K postgresql.conf
4.0K postmaster.opts
4.0K postmaster.pid
sh-4.4$
sh-4.4$ ls -l base/
total 260
drwxrws---. 2 1000690000 1000690000 4096 Nov 21 04:52 1
drwxrws---. 2 1000690000 1000690000 4096 Nov 21 04:52 13435
drwxrws---. 2 1000690000 1000690000 12288 Nov 23 00:49 13436
drwx--S---. 2 1000690000 1000690000 241664 Nov 22 23:13 pgsql_tmp
sh-4.4$
sh-4.4$cd /var/lib/pgsql/data/userdata/base/13436
sh-4.4$ du -sh *|grep G
1.1G 16943
1.1G 16943.1
1.1G 16943.2
1.1G 16943.3
1.1G 16943.4
1.1G 16943.5
1.1G 16943.6
1.1G 16943.7
1.1G 16948
1.1G 16956
1.1G 16959
- Check if Clair is managed by the operator and check its PVC details if the volume size is default(50GB) or is changed.
$ oc get quayregistry quayregistryname -oyaml
...
spec:
components:
- kind: clair
managed: true
- kind: clairpostgres
managed: true
...
$ oc get pvc <pvc_name> -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
quay-buildmanager-hostname: ""
quay-component: clair-postgres
quay-operator-service-endpoint: http://quay.example.com:port-number
quay-registry-hostname: quay.example.com
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
volume.kubernetes.io/selected-node: ip-10-21-xx-xx.ap-southeast-2.compute.internal
volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
finalizers:
- kubernetes.io/pvc-protection
labels:
quay-component: clair-postgres
quay-operator/quayregistry: quay
name: quay-clair-postgres
namespace: quay-project
ownerReferences:
- apiVersion: quay.redhat.com/v1
kind: QuayRegistry
name: quayregistryname
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp2
volumeMode: Filesystem
volumeName: pvc-xxxx
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 100Gi
phase: Bound
- Check what rows are consuming for space.
$ oc exec -it quay-clair-postgres -- /bin/bash
bash-4.4$ psql
psql (10.21)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
(3 rows)
postgres=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------------------------+----------+----------+------------+-------------
public | dist | table | postgres | 16 kB |
public | dist_id_seq | sequence | postgres | 8192 bytes |
public | dist_scanartifact | table | postgres | 16 kB |
public | enrichment | table | postgres | 63 MB |
public | enrichment_id_seq | sequence | postgres | 8192 bytes |
public | indexreport | table | postgres | 249 MB |
public | key | table | postgres | 8192 bytes |
public | latest_vuln | view | postgres | 0 bytes |
public | layer | table | postgres | 1640 kB |
public | layer_id_seq | sequence | postgres | 8192 bytes |
public | libindex_migrations | table | postgres | 8192 bytes |
public | libvuln_migrations | table | postgres | 8192 bytes |
public | manifest | table | postgres | 1544 kB |
public | manifest_id_seq | sequence | postgres | 8192 bytes |
public | manifest_index | table | postgres | 854 MB |
public | manifest_index_id_seq | sequence | postgres | 8192 bytes |
public | manifest_layer | table | postgres | 47 MB |
public | notification | table | postgres | 3216 kB |
public | notification_body | table | postgres | 926 MB |
public | notifier_migrations | table | postgres | 8192 bytes |
public | notifier_update_operation | table | postgres | 1272 kB |
public | package | table | postgres | 1216 kB |
public | package_id_seq | sequence | postgres | 8192 bytes |
public | package_scanartifact | table | postgres | 136 MB |
public | receipt | table | postgres | 1048 kB |
public | repo | table | postgres | 16 kB |
public | repo_id_seq | sequence | postgres | 8192 bytes |
public | repo_scanartifact | table | postgres | 1008 kB |
public | scanned_layer | table | postgres | 7760 kB |
public | scanned_manifest | table | postgres | 7152 kB |
public | scanner | table | postgres | 16 kB |
public | scanner_id_seq | sequence | postgres | 8192 bytes |
public | scannerlist | table | postgres | 8192 bytes |
public | scannerlist_id_seq | sequence | postgres | 8192 bytes |
public | uo_enrich | table | postgres | 109 MB |
public | uo_vuln | table | postgres | 49 GB |
public | update_operation | table | postgres | 920 kB |
public | update_operation_id_seq | sequence | postgres | 8192 bytes |
public | updater_status | table | postgres | 864 kB |
public | vuln | table | postgres | 12 GB |
public | vuln_id_seq | sequence | postgres | 8192 bytes |
(41 rows)
- Check size and number of rows for tables named manifest* using the below psql query and investigate further
clairv4=> SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
relation | size
------------------------------------------------------------------+---------
public.manifest_index_unique | 9071 MB
public.manifest_index | 6827 MB
public.manifest_index_manifest_id_package_id_dist_id_repo_id_idx | 6819 MB
public.vuln | 6250 MB
public.uo_vuln | 3512 MB
public.uo_vuln_pkey | 2690 MB
public.manifest_index_pkey | 2340 MB
pg_toast.pg_toast_826417 | 1129 MB
public.package_scanartifact_pkey | 1112 MB
public.uo_vuln_vuln_idx | 913 MB
pg_toast.pg_toast_826828 | 883 MB
public.package_scanartifact | 807 MB
public.uo_vuln_uo_idx | 525 MB
public.vuln_hash_kind_hash_key | 316 MB
public.vuln_lookup_idx | 300 MB
public.vuln_pkey | 176 MB
public.scanned_layer | 82 MB
public.vuln_updater_idx | 64 MB
public.scanned_layer_pkey | 63 MB
pg_toast.pg_toast_826417_index | 51 MB
(20 rows)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.