How to expand the "db-noobaa-db-pg-0" PVC on OpenShift Data Foundation (ODF) 4.18 and below
Environment
- Red Hat OpenShift Container Platform (OCP)
- 4.x
- Red Hat OpenShift Data Foundation (ODF)
- 4.x
- Red Hat Quay (RHQ)
- 3.x
Issue
-
The
db-noobaa-db-pg-0PVC has become full, preventing Postgres (OCS/ODF 4.7+) or Mongo (OCS 4.6 and below) server from starting. -
There are some related articles that report a similar issue:
Expanding the db-noobaa-db-pg PVC - OpenShift Data Foundation (ODF) v4.19+
Change the Multi-Cloud Object Gateway Database's Collation Locale to C
How to Check the Size/Consumption of the PostgreSQL Database in the db-noobaa-db-pg-0 PVC
Resolution
Before starting, ensure that adequate space is available in the ocs-storagecluster-cephblockpool via ceph df, and that Ceph is reporting HEALTH_OK with all PGs reporting active+clean via ceph status.
NOTE: If Ceph is not reporting HEALTH_OK nor are all PGs reporting active+clean, please open a support case for further investigation.
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph -s --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring
health: HEALTH_OK <---------------------- HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 41m)
mgr: a(active, since 41m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 41m), 3 in (since 41m)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 92 objects, 138 MiB
usage: 277 MiB used, 300 GiB / 300 GiB avail
pgs: 97 active+clean <---------------------- active+clean
io:
client: 1.2 KiB/s rd, 9.0 KiB/s wr, 2 op/s rd, 1 op/s wr
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph df --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
TOTAL 6 TiB 6.0 TiB 3.6 GiB 3.6 GiB 0.06
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
ocs-storagecluster-cephblockpool 1 32 340 MiB 177 1021 MiB 0.02 1.7 TiB <----- cephblockpool
After assuring everything is OK as previous instructions, proceed with the following steps:
-
In the event this is a standalone NooBaa deployment where the
db-noobaa-db-pg-0PVC is not backed by storageclassocs-storagecluster-ceph-rbd, validate thatvolumeExpansionis supported:$ oc get sc -n <namespace> <storageclass-name> -o yaml | grep -i expansion allowVolumeExpansion: true <----- -
Make note of the
db-noobaa-db-pg-0PVC capacity:$ oc get pvc -n openshift-storage NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS db-noobaa-db-pg-0 Bound pvc-xxxxx-xxxx-xxxx-xxxxxxxxx 50Gi RWO ocs-storagecluster-ceph-rbd -
Scale down the NooBaa services:
$ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0 $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=0 -
After pod termination (it takes time, so there is no need to force deletion), edit the PVC in the
spec-> resources -> requests -> storagesection only. Thecapacityin thestatussection will update once the volume expansion is successful, which will reflect in the$ oc get pvc -n openshift-storage | grep noobaaoutput.NOTE: ODF 4.6 and below uses MongoDB, not Postgres. Therefore, the statefulset will be named
noobaa-db, the pod will be namednoobaa-db-0and the PVC will be nameddb-noobaa-db-0.To expand a volume from 50Gi to 100Gi, for example, do the following:
WARNING: Only volume expansion is allowed. Rolling back to a smaller volume size is not supported. Take extra precaution prior to saving the YAML to ensure the desired size is correct.
$ oc edit pvc/db-noobaa-db-pg-0 -n openshift-storage spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi <----------------------------- Change to 100Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Filesystem volumeName: pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19a status: accessModes: - ReadWriteOnce capacity: storage: 50Gi <------------------------------ Do not touch, this will change after noobaa-db pod is started phase: BoundCheck now that the associated PV
pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19ahas been expanded, but not yet the PVC, as it requires the filesystem to be expanded. That will be done whennoobaa-db-pgpod is started:$ oc get pv |grep db-noobaa-db-pg-0 pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19a 100Gi RWO Delete Bound openshift-storage/db-noobaa-db-pg-0 ocs-storagecluster-ceph-rbd <unset> 144d -
After finishing last step, scale up the NooBaa services:
$ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=1 $ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1 -
The new capacity is reflected in the following output:
$ oc get pvc -n openshift-storage | grep noobaa NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS db-noobaa-db-pg-0 Bound pvc-<omitted> 100Gi RWO ocs-storagecluster-ceph-rbd -
Once all pods have been in a
Runningstate for at least 3 minutes, validate that NooBaa is in aReadyphase:$ oc get noobaa -n openshift-storage NAME S3-ENDPOINTS STS-ENDPOINTS IMAGE PHASE AGE noobaa ["https://<omitted>"] ["https://<omitted>"] registry.redhat.io/<omitted> Ready 46h $ oc get backingstore -n openshift-storage NAME TYPE PHASE AGE noobaa-default-backing-store <omitted> Ready <---- 35hNOTE: Occasionally, NooBaa may still be in a
Connectingphase and/or may not come to aReadystate. If this is observed after the above has been performed, please follow the steps in section 12.1. Restoring the Multicloud Object Gateway of the product documentation. Perform one final restart of the pods in the order shown which will bring NooBaa back to aReadyphase. -
In case you get events like:
Warning DBVolumeResourcesIsImmutable noobaa-operator spec.dbVolumeResources is immutable and cannot be updated for volume "db" in existing StatefulSet "noobaa-core" since it requires volume recreate and migrate which is unsupported by the operator
Apply these additional changes to get rid of it:
-
oc edit storagecluster --> then on the storagecluster
ocs-storageclusteradd under spec.resources these lines:noobaa-db-vol: requests: storage: 100Gi <<-- new sizeNOTE. Make sure spec.multiCloudGateway.reconcileStrategy is NOT set to unmanaged (by default this value is not configured as it is hard coded to managed)
NOTE. This change will trigger the change on the noobaa CR of this value:
noobaa.spec.dbVolumeResources.requests.storage, but the STSnoobaa-db-pgstill retain the older value, so we need to delete it to get it recreated by the operator: -
Scale down noobaa pods:
$ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0 $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=0 -
Make a yaml copy of the sts:
$ oc get sts noobaa-db-pg -n openshift-storage -oyaml > noobaa-db-pg.bkp.yaml -
Now delete the sts
noobaa-db-pg:$ oc -n openshift-storage delete sts noobaa-db-pg -
Once the sts noobaa-db-pg is gone, scale up everything
$ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1 $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=1noobaa-operator will recreate the statefulset
noobaa-db-pgthis time with the correct storage value.At this point
storageclusterCR,noobaaCR, stsnoobaa-db-pgand pvcdb-noobaa-db-pg-0should have the same size. Failure to update the statefulset with the correct value will result in the above recurring eventDBVolumeResourcesIsImmutable
Root Cause
When troubleshooting noobaa-db-pg, the db-noobaa-db-pg-0 PVC may become full, preventing Postgres server from starting. Expanding db-noobaa-db-pg-0 PVC will allow Postgres server to start back up again to finish troubleshooting.
Diagnostic Steps
- Review the pod logs for
noobaa-db-pg-0:
$ oc logs noobaa-db-pg-0
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL: could not write lock file "postmaster.pid": No space left on device
stopped waiting
pg_ctl: could not start server
-
The file Content from github.com is not included.ocs-operator/controllers/defaults/resources.go contains the “default resource requirements” for cpu and mem, but also for the noobaa DB PVC size
this one is the only corev1.ResourceStorage
"noobaa-db-vol": { Requests: corev1.ResourceList{ corev1.ResourceStorage: resource.MustParse("50Gi"), <<--- }, },We can override the default values for MEM and CPU by editing the StorageCluster as per Performance tuning guide for Multicloud Object Gateway (NooBaa)
In our case, we can do the same for this other resource noobaa-db-vol
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.