How to expand the "db-noobaa-db-pg-0" PVC on OpenShift Data Foundation (ODF) 4.18 and below

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (OCP)
    • 4.x
  • Red Hat OpenShift Data Foundation (ODF)
    • 4.x
  • Red Hat Quay (RHQ)
    • 3.x

Issue

Resolution

Before starting, ensure that adequate space is available in the ocs-storagecluster-cephblockpool via ceph df, and that Ceph is reporting HEALTH_OK with all PGs reporting active+clean via ceph status.

NOTE: If Ceph is not reporting HEALTH_OK nor are all PGs reporting active+clean, please open a support case for further investigation.

$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph -s  --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring

    health: HEALTH_OK            <---------------------- HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 41m)
    mgr: a(active, since 41m)
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 41m), 3 in (since 41m)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 92 objects, 138 MiB
    usage:   277 MiB used, 300 GiB / 300 GiB avail
    pgs:     97 active+clean      <---------------------- active+clean
 
  io:
    client:   1.2 KiB/s rd, 9.0 KiB/s wr, 2 op/s rd, 1 op/s wr


$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph df  --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring

--- RAW STORAGE ---
CLASS   SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    6 TiB  6.0 TiB  3.6 GiB   3.6 GiB       0.06
TOTAL  6 TiB  6.0 TiB  3.6 GiB   3.6 GiB       0.06
--- POOLS ---
POOL                                        ID  PGS   STORED  OBJECTS      USED  %USED  MAX AVAIL
ocs-storagecluster-cephblockpool             1   32  340 MiB      177  1021 MiB   0.02    1.7 TiB <----- cephblockpool

After assuring everything is OK as previous instructions, proceed with the following steps:

  1. In the event this is a standalone NooBaa deployment where the db-noobaa-db-pg-0 PVC is not backed by storageclass ocs-storagecluster-ceph-rbd, validate that volumeExpansion is supported:

    $ oc get sc -n <namespace> <storageclass-name> -o yaml | grep -i expansion
    allowVolumeExpansion: true <-----
    
  2. Make note of the db-noobaa-db-pg-0 PVC capacity:

       $ oc get pvc  -n openshift-storage
    
       NAME                      STATUS   VOLUME                         CAPACITY   ACCESS MODES   STORAGECLASS 
       db-noobaa-db-pg-0         Bound    pvc-xxxxx-xxxx-xxxx-xxxxxxxxx   50Gi       RWO            ocs-storagecluster-ceph-rbd 
    
  3. Scale down the NooBaa services:

    $ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0
    $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=0
    
  4. After pod termination (it takes time, so there is no need to force deletion), edit the PVC in the spec-> resources -> requests -> storage section only. The capacity in the status section will update once the volume expansion is successful, which will reflect in the $ oc get pvc -n openshift-storage | grep noobaa output.

    NOTE: ODF 4.6 and below uses MongoDB, not Postgres. Therefore, the statefulset will be named noobaa-db, the pod will be named noobaa-db-0 and the PVC will be named db-noobaa-db-0.

    To expand a volume from 50Gi to 100Gi, for example, do the following:

    WARNING: Only volume expansion is allowed. Rolling back to a smaller volume size is not supported. Take extra precaution prior to saving the YAML to ensure the desired size is correct.

    $ oc edit pvc/db-noobaa-db-pg-0 -n openshift-storage
    
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi <----------------------------- Change to 100Gi
      storageClassName: ocs-storagecluster-ceph-rbd
      volumeMode: Filesystem
      volumeName: pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19a
    status:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 50Gi <------------------------------ Do not touch, this will change after noobaa-db pod is started
      phase: Bound
    

    Check now that the associated PV pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19a has been expanded, but not yet the PVC, as it requires the filesystem to be expanded. That will be done when noobaa-db-pg pod is started:

    $ oc get pv |grep db-noobaa-db-pg-0
    
    pvc-80a61324-2a56-4ed0-89ec-ba7d85d4f19a   100Gi      RWO            Delete           Bound    openshift-storage/db-noobaa-db-pg-0                           ocs-storagecluster-ceph-rbd   <unset>                          144d
    
  5. After finishing last step, scale up the NooBaa services:

    $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=1
    $ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1
    
  6. The new capacity is reflected in the following output:

    $ oc get pvc -n openshift-storage | grep noobaa
    NAME                 STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   
    db-noobaa-db-pg-0    Bound    pvc-<omitted>  100Gi      RWO            ocs-storagecluster-ceph-rbd   
    
  7. Once all pods have been in a Running state for at least 3 minutes, validate that NooBaa is in a Ready phase:

    $ oc get noobaa -n openshift-storage
    NAME     S3-ENDPOINTS           STS-ENDPOINTS             IMAGE                          PHASE   AGE
    noobaa   ["https://<omitted>"]  ["https://<omitted>"]     registry.redhat.io/<omitted>   Ready   46h
    
    $ oc get backingstore -n openshift-storage
    NAME                           TYPE       PHASE             AGE
    noobaa-default-backing-store   <omitted>  Ready <----       35h
    

    NOTE: Occasionally, NooBaa may still be in a Connecting phase and/or may not come to a Ready state. If this is observed after the above has been performed, please follow the steps in section 12.1. Restoring the Multicloud Object Gateway of the product documentation. Perform one final restart of the pods in the order shown which will bring NooBaa back to a Ready phase.

  8. In case you get events like:

Warning   DBVolumeResourcesIsImmutable   noobaa-operator              spec.dbVolumeResources is immutable and cannot be updated for volume "db" in existing StatefulSet "noobaa-core" since it requires volume recreate and migrate which is unsupported by the operator

Apply these additional changes to get rid of it:

  • oc edit storagecluster --> then on the storagecluster ocs-storagecluster add under spec.resources these lines:

    noobaa-db-vol:
        requests:
          storage: 100Gi <<-- new size
    

    NOTE. Make sure spec.multiCloudGateway.reconcileStrategy is NOT set to unmanaged (by default this value is not configured as it is hard coded to managed)

    NOTE. This change will trigger the change on the noobaa CR of this value:
    noobaa.spec.dbVolumeResources.requests.storage , but the STS noobaa-db-pg still retain the older value, so we need to delete it to get it recreated by the operator:

  • Scale down noobaa pods:

    $ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=0
    $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=0
    
  • Make a yaml copy of the sts:

    $ oc get sts noobaa-db-pg -n openshift-storage -oyaml > noobaa-db-pg.bkp.yaml
    
  • Now delete the sts noobaa-db-pg :

    $ oc -n openshift-storage delete sts noobaa-db-pg
    
  • Once the sts noobaa-db-pg is gone, scale up everything

    $ oc -n openshift-storage scale deployment noobaa-operator noobaa-endpoint --replicas=1
    $ oc -n openshift-storage scale sts noobaa-core noobaa-db-pg --replicas=1
    

    noobaa-operator will recreate the statefulset noobaa-db-pg this time with the correct storage value.

    At this point storagecluster CR, noobaa CR, sts noobaa-db-pg and pvc db-noobaa-db-pg-0 should have the same size. Failure to update the statefulset with the correct value will result in the above recurring event DBVolumeResourcesIsImmutable

Root Cause

When troubleshooting noobaa-db-pg, the db-noobaa-db-pg-0 PVC may become full, preventing Postgres server from starting. Expanding db-noobaa-db-pg-0 PVC will allow Postgres server to start back up again to finish troubleshooting.

Diagnostic Steps

  • Review the pod logs for noobaa-db-pg-0:
$ oc logs noobaa-db-pg-0
waiting for server to start....2022-08-25 19:48:38.185 UTC [22] FATAL:  could not write lock file "postmaster.pid": No space left on device
 stopped waiting
pg_ctl: could not start server
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.