Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.19+
Environment
Red Hat OpenShift Container Platform (RCOCP) v4.x
Red Hat OpenShift Data Foundations (ODF) v4.19+
Red Hat Quay (RHQ) v3.x
Issue
The noobaa-db-pg-cluster-<1|2> PVC has become full, preventing the PostgreSQL server from starting.
Related Articles:
Expanding the db-noobaa-db-pg-0 PVC - OpenShift Data Foundation (ODF) v4.18 and Below
Change the Multi-Cloud Object Gateway Database's Collation Locale to C
How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC
Resolution
Before starting, ensure adequate space is available in the ocs-storagecluster-cephblockpool, via ceph df, and Ceph is reporting HEALTH_OK with all PGs reporting active+clean via ceph status.
NOTE: If ceph is NOT reporting HEALTH_OK nor are all PGs reporting active+clean, please open a support case for further investigation.
$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config
health: HEALTH_OK <---------------------- HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 41m)
mgr: a(active, since 41m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 41m), 3 in (since 41m)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 92 objects, 138 MiB
usage: 277 MiB used, 300 GiB / 300 GiB avail
pgs: 97 active+clean <---------------------- active+clean
io:
client: 1.2 KiB/s rd, 9.0 KiB/s wr, 2 op/s rd, 1 op/s wr
$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph df -c /var/lib/rook/openshift-storage/openshift-storage.config
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
TOTAL 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
ocs-storagecluster-cephblockpool 1 32 340 MiB 177 1021 MiB 0.02 1.7 TiB <----- cephblockpool
Procedures:
- In the event that this is a standalone NooBaa deployment where the
db-noobaa-db-pg-0PVC is NOT backed by the storageclassocs-storagecluster-ceph-rbd, validate thatvolumeExpansionis supported. For example:
$ oc get sc <storageclass-name> -o yaml | grep -i expansion
allowVolumeExpansion: true <-----
- Make note of the
noobaa-db-pg-cluster-1andnoobaa-db-pg-cluster-2PVC capacities:
$ oc -n openshift-storage get clusters.postgresql.cnpg.noobaa.io noobaa-db-pg-cluster -o jsonpath={.spec.storage.size}
50Gi
$ oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY
noobaa-db-pg-cluster-1 Bound pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03 50Gi
noobaa-db-pg-cluster-2 Bound pvc-41af5072-163a-4dca-8827-ba574103209a 50Gi
- Scale down the NooBaa services:
oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0
oc -n openshift-storage scale sts noobaa-core --replicas=0
Example: Expanding volume from 50Gi to 100Gi.
WARNING: ONLY VOLUME EXPANSION IS ALLOWED. ROLLING BACK TO A SMALLER VOLUME SIZE IS NOT SUPPORTED. TAKE EXTRA PRECAUTION TO ENSURE THE DESIRED SIZE IS CORRECT.
- After pod termination of the NooBaa services (it takes time, don't force delete), patch the storagecluster.yaml using the command below and update the storage size to the desired size. The example below uses
100Gi.
$ oc patch -n openshift-storage storagecluster ocs-storagecluster --type merge --patch '{"spec": {"resources": {"noobaa-db-vol":{"requests":{"storage":"100Gi"}}}}}'
Validate:
- For the Ceph RBD storageclass, it should automatically resize after ~3 minutes. Once confirmed, no further action is needed.
$ oc get storagecluster -n openshift-storage -o yaml | grep -A2 noobaa-db-vol
noobaa-db-vol:
requests:
storage: 100Gi
$ oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY
noobaa-db-pg-cluster-1 Bound pvc-6191c66e-18fa-4792-8ccd-4838ded0fd03 100Gi
noobaa-db-pg-cluster-2 Bound pvc-41af5072-163a-4dca-8827-ba574103209a 100Gi
NOTE: In most instances the PVC resizing will be successful. However, if the PVC resize appears to be unsuccessful, describe the PVC as the event below may appear, and provide further instructions:
$ oc get pvc/db-noobaa-db-pg-<pvc-name> -n openshift-storage -o yaml
...
lastTransitionTime: "2025-05-30T20:24:05Z"
message: Waiting for user to (re-)start a pod to finish file system resize of volume on node.
- If the NooBaa PVC expansion does not occur after some time, please restart the Cloud Native PostgreSQL manager pod:
$ oc delete pod -n openshift-storage -l app.kubernetes.io/name=cloudnative-pg
- Scale up the NooBaa services:
oc -n openshift-storage scale deployment noobaa-operator
oc -n openshift-storage noobaa-endpoint --replicas=1 <--- Modify to your HPA minimum pod specification
oc -n openshift-storage scale sts noobaa-core --replicas=1
- Once all pods have been in a
Runningstate for ~3 minutes, validate that NooBaa is in aReadyphase:
$ oc get noobaa -n openshift-storage
NAME S3-ENDPOINTS STS-ENDPOINTS IMAGE PHASE AGE
noobaa ["https://<omitted>"] ["https://<omitted>"] registry.redhat.io/<omitted> Ready 46h
$ oc get backingstore -n openshift-storage
NAME TYPE PHASE AGE
noobaa-default-backing-store <omitted> Ready <---- 35h
NOTE: Occasionally, NooBaa may still be in a Connecting phase and/or may not come to a Ready state. If this is observed after the above has been performed, please follow the steps in section 12.1. Restoring the Multicloud Object Gateway of the product documentation. Perform one final restart of the pods in the order shown, which will bring NooBaa back to a Ready phase.
Root Cause
When troubleshooting the noobaa-db resource, noobaa-db-pg-cluster-x PVC may become full, preventing the Postgres server from starting. Expanding noobaa-db-pg-cluster-1 and noobaa-db-pg-cluster-2 PVCs will allow Postgres server to start back up again to finish troubleshooting.
Diagnostic Steps
Review the pod logs for noobaa-db-pg-cluster-1-<pod-name> and noobaa-db-pg-cluster-2-<pod-name>
$ oc logs noobaa-db-pg-cluster-1-<pod-name>
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL: could not write lock file "postmaster.pid": No space left on device
stopped waiting
pg_ctl: could not start server
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.