Ceph/ODF: RBD volumes over subscribe Ceph capacity, unexpected capacity growth.

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x

Issue

RBD volumes over subscribe Ceph capacity, unexpected capacity growth.

Because RBD volumes only consume space when they are written to, it's possible to create more RBD volumes than the Ceph can host. An RBD volume is similar to a sparse file or thin LUN. where capacity is only used when the sparse file or thin LUN is hydrated.

Unexpected capacity growth.

Example #1:

sh-4.4$ ceph df detail
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    1.2 TiB  443 GiB  757 GiB   757 GiB      63.10
TOTAL  1.2 TiB  443 GiB  757 GiB   757 GiB      63.10

--- POOLS ---
POOL                                                   ID  PGS   STORED   (DATA)   (OMAP)  OBJECTS     USED   (DATA)
ocs-storagecluster-cephblockpool                        1  128  226 GiB  226 GiB  1.3 KiB   67.80k  677 GiB  677 GiB  <-- [1]
ocs-storagecluster-cephobjectstore.rgw.log              2    8  7.6 MiB   24 KiB  7.6 MiB      340   25 MiB  2.0 MiB
ocs-storagecluster-cephobjectstore.rgw.buckets.index    3    8  135 KiB      0 B  135 KiB       22  404 KiB      0 B
ocs-storagecluster-cephobjectstore.rgw.control          4    8      0 B      0 B      0 B        8      0 B      0 B
.rgw.root                                               5    8  5.7 KiB  5.7 KiB      0 B       16  200 KiB  200 KiB
ocs-storagecluster-cephobjectstore.rgw.meta             6    8  3.8 KiB  3.8 KiB      0 B       16  172 KiB  172 KiB
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec   7    8      0 B      0 B      0 B        0      0 B      0 B
device_health_metrics                                   8    1  1.9 MiB      0 B  1.9 MiB       12  5.7 MiB      0 B
ocs-storagecluster-cephfilesystem-metadata              9   32  275 MiB  247 MiB   28 MiB    2.22k  825 MiB  742 MiB
ocs-storagecluster-cephobjectstore.rgw.buckets.data    10   32  824 MiB  824 MiB      0 B      377  2.4 GiB  2.4 GiB
ocs-storagecluster-cephfilesystem-data0                11   32   22 GiB   22 GiB      0 B   34.15k   65 GiB   65 GiB
ocs-storagecluster-cephobjectstore.rgw.otp             12   32      0 B      0 B      0 B        0      0 B      0 B


sh-4.4$ rbd -p ocs-storagecluster-cephblockpool du --exact
NAME                                          PROVISIONED  USED
csi-vol-014228ac-f8c1-11ed-aa54-0a580a81028c       50 GiB   34 GiB
csi-vol-15ac5070-cf26-11ed-bc32-0a580a810213       20 GiB   28 MiB
csi-vol-214ee4e9-cf26-11ed-bc32-0a580a810213       20 GiB  154 MiB
csi-vol-2d921f13-cf26-11ed-bc32-0a580a810213        5 GiB  123 MiB
csi-vol-2d933b8f-cf26-11ed-bc32-0a580a810213        8 GiB  395 MiB
csi-vol-2d986d3e-cf26-11ed-bc32-0a580a810213       20 GiB  895 MiB
csi-vol-2d9b046f-cf26-11ed-bc32-0a580a810213        2 GiB   75 MiB
csi-vol-318b03d2-5637-11ee-8186-0a580a82040a      100 GiB  5.0 GiB
csi-vol-37091fa5-d011-11ed-ac85-0a580a80020c        8 GiB  396 MiB
csi-vol-3cb757a3-d7b4-11ed-ac85-0a580a80020c        8 GiB  624 MiB
csi-vol-41640608-d7b4-11ed-ac85-0a580a80020c        1 GiB  5.5 MiB
csi-vol-60a8e959-c435-11ed-9a17-0a580a80020b       50 GiB  532 MiB
csi-vol-893f406f-469a-11ee-99bf-0a580a8202d9      100 GiB  435 MiB
csi-vol-893f449a-469a-11ee-99bf-0a580a8202d9      100 GiB  430 MiB
csi-vol-893f65ac-469a-11ee-99bf-0a580a8202d9      100 GiB  448 MiB
csi-vol-9cecde6d-469a-11ee-99bf-0a580a8202d9      100 GiB   64 GiB
csi-vol-9cece2e3-469a-11ee-99bf-0a580a8202d9      100 GiB   64 GiB
csi-vol-9cecec0a-469a-11ee-99bf-0a580a8202d9      100 GiB   64 GiB
csi-vol-e3fcde00-f424-11ed-aa54-0a580a81028c       10 GiB  679 MiB
csi-vol-f63a48e8-d2f7-11ed-ac85-0a580a80020c        8 GiB      0 B
csi-vol-ffe219c6-f8c0-11ed-aa54-0a580a81028c       50 GiB  420 MiB
<TOTAL>                                           960 GiB  237 GiB [1]

Resolution

  • The example above is extreme as a user has created the potential for 960 GiB in RBD volumes, but the Ceph is only 1.2 TiB

    • Again, RBD volumes only consume space when they are written to.
    • Note the used RBD capacity (237) x 3 [1] is approximately equal to the Used Capacity of 757 [1].
    • Any one of the 100 GiB RBD volumes could increase utilization and fill up the Ceph system.
  • It is possible for an RBD volume to be orphaned in the Ceph

    • Use KCS Article #6293581 to determine if orphan RBD volumes are present.
    • The same KCS resource will show how to remove an orphan RBD volume.
  • RBD volumes are not mounted with the discard option, fstrim is required to reclaim space

  • If urgent, run fstrim -av on all Worker Nodes to reclaim space:

    $ for i in $(oc get node -l node-role.kubernetes.io/worker -o jsonpath='{ .items[*].metadata.name }'); do oc debug node/${i} -- chroot /host  fstrim -av; done
    
  • For systems without discreet Worker Node, run fstrim -av on all nodes to reclaim space:

    $ for i in $(oc get node -o name); do oc debug $i -- chroot /host fstrim -av; done
    

Root Cause

Possible reasons:

  • Oversubscribing the Ceph capacity, (creating more RBD volumes than the Ceph can host).
  • Orphan RBD Volumes.
  • Needing to implement fstrim
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.