How to perform 'fstrim' operation on the RBD PV's in an OpenShift Container Storage 4.X cluster?

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Storage (OCS)
    • 4.8 and previous
  • Red Hat OpenShift Data Foundation (ODF)
    • 4.9

Issue

  • How to perform fstrim operation on the RBD PV's in an OpenShift Container Storage 4.X cluster?

  • The Persistent Storage tab on the OpenShift UI shows incorrect available space.

Resolution

  • Note: Starting with ODF 4.10, the new "ReclaimSpace" feature allows you to enable automatic reclaiming of freed-up space from RBD PersistentVolumes. Please check out the release notes and documentation for more details.

  • In a traditional file system, a file deletion will mark the respective inode pointers in the parent folder's directory as not used, but will not delete the data in the data blocks.

  • This is the same behaviour for Ceph. Ceph does not delete the object when deleting the file/data on the PV's, same as the traditional file system, and the object still remains on the RBD device.

  • The ceph df reports incorrect available space, and the same is reflected on the OCP UI and causes the confusion.

  • The most accurate information about the actual capacity currently in use can be confirmed with the "df -h" command or discard mount option. The discard mount option behaves similarly to fstrim and it's to clean up objects on the backend when the files get deleted to use the TRIM support by the underlying disks.

  • Using the discard option can cause performance degradation by enabling TRIM for blocks that are actually disabled, and thus they are not enabled by-default and is NOT Recommended.

  • If the capacity needs to be reclaimed, one can perform the fstrim operation on the mounted path of the PV.

  • The following steps are needed to reclaim the space from one of the PV's:

    1. Find the PVC from which data was deleted
    $ oc get pvc -A | grep pvc-179feda8-8a94-419e-b309-6cf6b1a22d7d
    openshift-logging          elasticsearch-elasticsearch-cdm-ag8jhaoy-1     Bound    pvc-179feda8-8a94-419e-b309-6cf6b1a22d7d   187Gi      RWO            ocs-storagecluster-ceph-rbd   5h43m
    
    1. Login to the node where the pod consuming the PVC is running
    $ oc get po -o wide | grep ag8jhaoy-1
    elasticsearch-cdm-ag8jhaoy-1-6fcd7cf5fb-n7jxl   2/2     Running     0       4m   1.1.1.1    
    $ oc debug no/dell-r740xd-1.gsslab.pnq2.redhat.com
    $ chroot /host
    $ sudo -i
    
    1. Find the mount path using PVC name

df -kh | grep pvc-179feda8-8a94-419e-b309-6cf6b1a22d7d

    /dev/rbd2          184G  1.5G  182G   1% /var/lib/kubelet/pods/229fcec9-b166-4eed-9604-dd84603129a7/volumes/kubernetes.io~csi/pvc-179feda8-8a94-419e-b309-6cf6b1a22d7d/mount


4) Run 'fstrim' on the path

fstrim /var/lib/kubelet/pods/229fcec9-b166-4eed-9604-dd84603129a7/volumes/kubernetes.io~csi/pvc-179feda8-8a94-419e-b309-6cf6b1a22d7d/mount

Root Cause

  • Ceph does not delete the object when deleting the file, same as the traditional file system, and the object still remains on the RBD device.
    Also a new write will either over-write these objects or create new ones, as required.
    Therefore, the objects are still present in the pool, a 'ceph df' will show the pool being occupied with the objects, even though those are not used.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.