Ceph/ODF: RBD volumes over subscribe Ceph capacity, unexpected capacity growth.
Environment
Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x
Issue
RBD volumes over subscribe Ceph capacity, unexpected capacity growth.
Because RBD volumes only consume space when they are written to, it's possible to create more RBD volumes than the Ceph can host. An RBD volume is similar to a sparse file or thin LUN. where capacity is only used when the sparse file or thin LUN is hydrated.
Unexpected capacity growth.
Example #1:
sh-4.4$ ceph df detail
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 1.2 TiB 443 GiB 757 GiB 757 GiB 63.10
TOTAL 1.2 TiB 443 GiB 757 GiB 757 GiB 63.10
--- POOLS ---
POOL ID PGS STORED (DATA) (OMAP) OBJECTS USED (DATA)
ocs-storagecluster-cephblockpool 1 128 226 GiB 226 GiB 1.3 KiB 67.80k 677 GiB 677 GiB <-- [1]
ocs-storagecluster-cephobjectstore.rgw.log 2 8 7.6 MiB 24 KiB 7.6 MiB 340 25 MiB 2.0 MiB
ocs-storagecluster-cephobjectstore.rgw.buckets.index 3 8 135 KiB 0 B 135 KiB 22 404 KiB 0 B
ocs-storagecluster-cephobjectstore.rgw.control 4 8 0 B 0 B 0 B 8 0 B 0 B
.rgw.root 5 8 5.7 KiB 5.7 KiB 0 B 16 200 KiB 200 KiB
ocs-storagecluster-cephobjectstore.rgw.meta 6 8 3.8 KiB 3.8 KiB 0 B 16 172 KiB 172 KiB
ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 7 8 0 B 0 B 0 B 0 0 B 0 B
device_health_metrics 8 1 1.9 MiB 0 B 1.9 MiB 12 5.7 MiB 0 B
ocs-storagecluster-cephfilesystem-metadata 9 32 275 MiB 247 MiB 28 MiB 2.22k 825 MiB 742 MiB
ocs-storagecluster-cephobjectstore.rgw.buckets.data 10 32 824 MiB 824 MiB 0 B 377 2.4 GiB 2.4 GiB
ocs-storagecluster-cephfilesystem-data0 11 32 22 GiB 22 GiB 0 B 34.15k 65 GiB 65 GiB
ocs-storagecluster-cephobjectstore.rgw.otp 12 32 0 B 0 B 0 B 0 0 B 0 B
sh-4.4$ rbd -p ocs-storagecluster-cephblockpool du --exact
NAME PROVISIONED USED
csi-vol-014228ac-f8c1-11ed-aa54-0a580a81028c 50 GiB 34 GiB
csi-vol-15ac5070-cf26-11ed-bc32-0a580a810213 20 GiB 28 MiB
csi-vol-214ee4e9-cf26-11ed-bc32-0a580a810213 20 GiB 154 MiB
csi-vol-2d921f13-cf26-11ed-bc32-0a580a810213 5 GiB 123 MiB
csi-vol-2d933b8f-cf26-11ed-bc32-0a580a810213 8 GiB 395 MiB
csi-vol-2d986d3e-cf26-11ed-bc32-0a580a810213 20 GiB 895 MiB
csi-vol-2d9b046f-cf26-11ed-bc32-0a580a810213 2 GiB 75 MiB
csi-vol-318b03d2-5637-11ee-8186-0a580a82040a 100 GiB 5.0 GiB
csi-vol-37091fa5-d011-11ed-ac85-0a580a80020c 8 GiB 396 MiB
csi-vol-3cb757a3-d7b4-11ed-ac85-0a580a80020c 8 GiB 624 MiB
csi-vol-41640608-d7b4-11ed-ac85-0a580a80020c 1 GiB 5.5 MiB
csi-vol-60a8e959-c435-11ed-9a17-0a580a80020b 50 GiB 532 MiB
csi-vol-893f406f-469a-11ee-99bf-0a580a8202d9 100 GiB 435 MiB
csi-vol-893f449a-469a-11ee-99bf-0a580a8202d9 100 GiB 430 MiB
csi-vol-893f65ac-469a-11ee-99bf-0a580a8202d9 100 GiB 448 MiB
csi-vol-9cecde6d-469a-11ee-99bf-0a580a8202d9 100 GiB 64 GiB
csi-vol-9cece2e3-469a-11ee-99bf-0a580a8202d9 100 GiB 64 GiB
csi-vol-9cecec0a-469a-11ee-99bf-0a580a8202d9 100 GiB 64 GiB
csi-vol-e3fcde00-f424-11ed-aa54-0a580a81028c 10 GiB 679 MiB
csi-vol-f63a48e8-d2f7-11ed-ac85-0a580a80020c 8 GiB 0 B
csi-vol-ffe219c6-f8c0-11ed-aa54-0a580a81028c 50 GiB 420 MiB
<TOTAL> 960 GiB 237 GiB [1]
Resolution
-
The example above is extreme as a user has created the potential for 960 GiB in RBD volumes, but the Ceph is only 1.2 TiB
- Again, RBD volumes only consume space when they are written to.
- Note the used RBD capacity (237) x 3 [1] is approximately equal to the Used Capacity of 757 [1].
- Any one of the 100 GiB RBD volumes could increase utilization and fill up the Ceph system.
-
It is possible for an RBD volume to be orphaned in the Ceph
- Use KCS Article #6293581 to determine if orphan RBD volumes are present.
- The same KCS resource will show how to remove an orphan RBD volume.
-
RBD volumes are not mounted with the discard option,
fstrimis required to reclaim space- Follow these Documentation Links to add a cron job, it can be set to run weekly.
- See our Red Hat ODF Documentation, Managing and allocating storage resources, section 10
- 4.14 Documentation Link:
- 4.15 Documentation Link:
- 4.16 Documentation Link:
- 4.17 Documentation Link:
- 4.18 Documentation Link:
- 4.19 Documentation Link:
-
If urgent, run
fstrim -avon all Worker Nodes to reclaim space:$ for i in $(oc get node -l node-role.kubernetes.io/worker -o jsonpath='{ .items[*].metadata.name }'); do oc debug node/${i} -- chroot /host fstrim -av; done -
For systems without discreet Worker Node, run
fstrim -avon all nodes to reclaim space:$ for i in $(oc get node -o name); do oc debug $i -- chroot /host fstrim -av; done
Root Cause
Possible reasons:
- Oversubscribing the Ceph capacity, (creating more RBD volumes than the Ceph can host).
- Orphan RBD Volumes.
- Needing to implement fstrim
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.