OCS/ODF: Ceph internal cluster is FULL xx full osd(s)
Environment
Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x
Ceph (RADOS) Block Devices (RBD)
Ceph File System (CephFS)
Issue
Ceph internal cluster is "FULL", xx full osd(s)
How to delete stale / orphan cephfs clones and snapshots
Resolution
- A ReadOnly OCS/ODF cluster can be fixed by either increasing the Storage Capacity of the cluster or by deleting unwanted data from the cluster.
Increasing the OCS/ODF Cluster Capacity
The Storage Capacity can be increased: Scaling Storage for ODF.
Deleting Unwanted Data
For CephFS: How to delete stale / orphan cephfs clones and snapshots
-
RBD volumes over subscribe Ceph
- For RBD: RBD volumes over subscribe Ceph capacity, unexpected capacity growth
- This KCS also links to another KCS to find and remove orphaned RBD volumes and snapshots
- This KCS also links to Red Hat ODF Documentation to schedule fstrim on a regular basis
-
To delete any data, first the ODF/Ceph cluster must have the full ratio changed to alleviate the
FULLstate.- Even one full OSD will cause all writes and deletes to fail
- Follow KCS Article #4628891 or KCS Article #4870821 to access the Ceph CLI.
- Execute this command to increase the
full ratiofrom the default 85% to 88%.
$ ceph osd set-full-ratio 0.88
$ ceph osd set-backfillfull-ratio 0.83
$ ceph osd set-nearfull-ratio 0.78
- Once the cluster is no longer
FULL, delete as much unwanted data as possible:- Actually deleting data
- Running fstrim on RBD volumes and scheduling fstrim via a cron jobs
- Removing orphaned RBD and CephFS volumes
- Once the data is deleted and the OCS/ODF cluster is out of
FULLstate, revert theFULLthreshold to default.
$ ceph osd set-full-ratio 0.85
$ ceph osd set-backfillfull-ratio 0.80
$ ceph osd set-nearfull-ratio 0.75
NOTE: Do not increase the FULL or NEARFULL or BACKFILLFULL threshold above 88% without engaging Red Hat support
Root Cause
- The backend storage,
Red Hat Ceph, is full and a capacity should be increased. - There are
orphanedCephFS Clones and/or Snapshots. - Ceph RBD volumes need
fstrimexecuted to reclaim space.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.