clair-postgres pvc is full

Solution Verified - Updated

Environment

  • Red Hat Quay Operator
    • 3.8+

Issue

  • clair pods are in crashloopbackoff and pod logs show PVC is full.
sfit-registry-clair-app-6cf647c65-l24br             0/1     CrashLoopBackOff   1594 (4m25s ago)   5d15h
sfit-registry-clair-app-6ff4844c45-dkqs4            0/1     CrashLoopBackOff   1594 (4m46s ago)   5d16h
sfit-registry-clair-postgres-5dcd6c4899-w7cxr       0/1     Running            1099 (5m11s ago)   3d23h
  $ oc logs dr-clair-postgres
  waiting for server to start....2023-12-13 03:40:37.875 GMT [23] FATAL:  could not write lock file "postmaster.pid": No space left on device
   stopped waiting
  pg_ctl: could not start server
  Examine the log output.

Resolution

1. Clear rows from a table without removing the table structure. uo_vuln table contains information about update operations on Clair's database. And it's safe to truncate it.

$ oc get pods | grep clair
$ oc rsh clair-pod-name
sh-4.4$ psql \clairdb-name
psql (10.21)
Type "help" for help.

clairdb-name-# TRUNCATE TABLE uo_vuln;

2. Clair regularly updates its own CVE database in search for new definitions. The update frequency is defined by the Content from quay.github.io is not included..matcher.period value in Clair's config.yaml file. If this value does not exist, Clair should use the default value which is 30 minutes. The higher the count the less frequently it will update vuln database

Content from quay.github.io is not included..matcher.update_retention holds the number of updates in the database and garbage collects the rest. On new versions of Clair, the default value (if this field doesn't exist in Clair's config.yaml file) is 2. Setting it to 1 will decrease the growth of this table

...
matcher:
    update_retention: 1
    period: 12h
...

3. Increase the size of the pv/pvc.
The volume size of the clairpostgres component cannot be overridden. This is a known issue and will be fixed in a future version of Red Hat Quay This content is not included.(PROJQUAY-4301)

4. Redeploy clair as when clairdb is full pod goes into crashloopbackoff. When Clair database is dropped, the information related to already indexed images also gets dropped. To get the security reports back for those images, Quay must send them to Clair once again to be re-indexed. This can be a very lengthy procedure if there are many images stored in the registry. Clair will always recreate the database schema and will populate it with CVE data automatically if it's connected to the internet. But security reports will not be available until the indexing process is completed.

- Add sleep command to clair container through the clair deployment as follows?

$ oc edit deployment <clair-postgres-deployment>

...
    spec:
      containers:
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo hello; sleep 10;done"]  

- Once done restart clair postgres pod
$ oc delete pod pod-name 

- Downscale Quay operator to 0 pods.
$ oc scale --replicas=0 deployment quay-operator.v3.8.x -n <quay-operator-ns>

- Execute psql shell inside Clair's database
$ oc exec -it <clair-postgres> -- psql
- List databases to see what user was used to create the db.
\l+

- Drop current clair datbase
drop database <clair_database>;

- Create new Clair database with the same user as owner as the previous db had.
e.g 
   Name    | Owner | Encoding |  Collate   | 
-----------+-------+----------+-------------
 clair     | clair | UTF8     | en_US.utf8 |
 
create database clair owner clair

- Connect to the new database and create extension "uuid-ossp" on the database.
 \c clair;

- create extension uuid-ossp
CREATE EXTENSION "uuid-ossp";

- scale Quay operator back to 1 pod. It will reconcile the whole deployment.

Diagnostic Steps

  • Check detailed application logs as follows:
$ oc exec -it <clair-database-pod> -- cat /var/lib/pgsql/data/userdata/log/postgresql-xxx.log > clairpostgres.op
  • Check what file under userdata consumes the most space for postgresql
$ oc exec -it <database-pod> /bin/bash
sh-4.4$ psql  
postgres=# \c postgres;
You are now connected to database "postgres" as user "postgres".

sh-4.4$ cd /var/lib/pgsql/data/userdata
sh-4.4$ du -sh *
4.0K    PG_VERSION
16G    base  
4.0K    current_logfiles
568K    global
92K    log
4.0K    pg_commit_ts
4.0K    pg_dynshmem
8.0K    pg_hba.conf
4.0K    pg_ident.conf
16K    pg_logical
28K    pg_multixact
4.0K    pg_notify
4.0K    pg_replslot
4.0K    pg_serial
4.0K    pg_snapshots
4.0K    pg_stat
52K    pg_stat_tmp
128K    pg_subtrans
4.0K    pg_tblspc
4.0K    pg_twophase
1.1G    pg_wal 
48K    pg_xact
4.0K    postgresql.auto.conf
4.0K    postgresql.conf
4.0K    postmaster.opts
4.0K    postmaster.pid
sh-4.4$ 

sh-4.4$ ls -l base/
total 260
drwxrws---. 2 1000690000 1000690000   4096 Nov 21 04:52 1
drwxrws---. 2 1000690000 1000690000   4096 Nov 21 04:52 13435
drwxrws---. 2 1000690000 1000690000  12288 Nov 23 00:49 13436
drwx--S---. 2 1000690000 1000690000 241664 Nov 22 23:13 pgsql_tmp
sh-4.4$ 

sh-4.4$cd /var/lib/pgsql/data/userdata/base/13436
sh-4.4$ du -sh *|grep G
1.1G    16943
1.1G    16943.1
1.1G    16943.2
1.1G    16943.3
1.1G    16943.4
1.1G    16943.5
1.1G    16943.6
1.1G    16943.7
1.1G    16948
1.1G    16956
1.1G    16959
  • Check if Clair is managed by the operator and check its PVC details if the volume size is default(50GB) or is changed.

$ oc get quayregistry quayregistryname -oyaml
...
spec:
  components:
    - kind: clair
      managed: true
    - kind: clairpostgres
      managed: true
...
$ oc get pvc <pvc_name> -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    quay-buildmanager-hostname: ""
    quay-component: clair-postgres
    quay-operator-service-endpoint: http://quay.example.com:port-number
    quay-registry-hostname: quay.example.com
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
    volume.kubernetes.io/selected-node: ip-10-21-xx-xx.ap-southeast-2.compute.internal
    volume.kubernetes.io/storage-resizer: kubernetes.io/aws-ebs
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    quay-component: clair-postgres
    quay-operator/quayregistry: quay
  name: quay-clair-postgres
  namespace: quay-project
  ownerReferences:
  - apiVersion: quay.redhat.com/v1
    kind: QuayRegistry
    name: quayregistryname
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: gp2
  volumeMode: Filesystem
  volumeName: pvc-xxxx
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  phase: Bound
  • Check what rows are consuming for space.
$ oc exec -it quay-clair-postgres -- /bin/bash
bash-4.4$ psql
psql (10.21)
Type "help" for help.

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(3 rows)

postgres=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# \d+
                                  List of relations
 Schema |           Name            |   Type   |  Owner   |    Size    | Description
--------+---------------------------+----------+----------+------------+-------------
 public | dist                      | table    | postgres | 16 kB      |
 public | dist_id_seq               | sequence | postgres | 8192 bytes |
 public | dist_scanartifact         | table    | postgres | 16 kB      |
 public | enrichment                | table    | postgres | 63 MB      |
 public | enrichment_id_seq         | sequence | postgres | 8192 bytes |
 public | indexreport               | table    | postgres | 249 MB     |
 public | key                       | table    | postgres | 8192 bytes |
 public | latest_vuln               | view     | postgres | 0 bytes    |
 public | layer                     | table    | postgres | 1640 kB    |
 public | layer_id_seq              | sequence | postgres | 8192 bytes |
 public | libindex_migrations       | table    | postgres | 8192 bytes |
 public | libvuln_migrations        | table    | postgres | 8192 bytes |
 public | manifest                  | table    | postgres | 1544 kB    |
 public | manifest_id_seq           | sequence | postgres | 8192 bytes |
 public | manifest_index            | table    | postgres | 854 MB     |
 public | manifest_index_id_seq     | sequence | postgres | 8192 bytes |
 public | manifest_layer            | table    | postgres | 47 MB      |
 public | notification              | table    | postgres | 3216 kB    |
 public | notification_body         | table    | postgres | 926 MB     |
 public | notifier_migrations       | table    | postgres | 8192 bytes |
 public | notifier_update_operation | table    | postgres | 1272 kB    |
 public | package                   | table    | postgres | 1216 kB    |
 public | package_id_seq            | sequence | postgres | 8192 bytes |
 public | package_scanartifact      | table    | postgres | 136 MB     |
 public | receipt                   | table    | postgres | 1048 kB    |
 public | repo                      | table    | postgres | 16 kB      |
 public | repo_id_seq               | sequence | postgres | 8192 bytes |
 public | repo_scanartifact         | table    | postgres | 1008 kB    |
 public | scanned_layer             | table    | postgres | 7760 kB    |
 public | scanned_manifest          | table    | postgres | 7152 kB    |
 public | scanner                   | table    | postgres | 16 kB      |
 public | scanner_id_seq            | sequence | postgres | 8192 bytes |
 public | scannerlist               | table    | postgres | 8192 bytes |
 public | scannerlist_id_seq        | sequence | postgres | 8192 bytes |
 public | uo_enrich                 | table    | postgres | 109 MB     |
 public | uo_vuln                   | table    | postgres | 49 GB      |
 public | update_operation          | table    | postgres | 920 kB     |
 public | update_operation_id_seq   | sequence | postgres | 8192 bytes |
 public | updater_status            | table    | postgres | 864 kB     |
 public | vuln                      | table    | postgres | 12 GB      |
 public | vuln_id_seq               | sequence | postgres | 8192 bytes |
(41 rows)

  • Check size and number of rows for tables named manifest* using the below psql query and investigate further
  clairv4=> SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 20;
                             relation                             |  size   
------------------------------------------------------------------+---------
 public.manifest_index_unique                                     | 9071 MB
 public.manifest_index                                            | 6827 MB
 public.manifest_index_manifest_id_package_id_dist_id_repo_id_idx | 6819 MB
 public.vuln                                                      | 6250 MB
 public.uo_vuln                                                   | 3512 MB
 public.uo_vuln_pkey                                              | 2690 MB
 public.manifest_index_pkey                                       | 2340 MB
 pg_toast.pg_toast_826417                                         | 1129 MB
 public.package_scanartifact_pkey                                 | 1112 MB
 public.uo_vuln_vuln_idx                                          | 913 MB
 pg_toast.pg_toast_826828                                         | 883 MB
 public.package_scanartifact                                      | 807 MB
 public.uo_vuln_uo_idx                                            | 525 MB
 public.vuln_hash_kind_hash_key                                   | 316 MB
 public.vuln_lookup_idx                                           | 300 MB
 public.vuln_pkey                                                 | 176 MB
 public.scanned_layer                                             | 82 MB
 public.vuln_updater_idx                                          | 64 MB
 public.scanned_layer_pkey                                        | 63 MB
 pg_toast.pg_toast_826417_index                                   | 51 MB
(20 rows)
SBR
Product(s)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.