Ceph/ODF: CephFS VolumeSnapshot creation (and restore) takes a very long time to complete.
Environment
Red Hat OpenShift Container Platform (OCP) 4.14+
Red Hat OpenShift Data Foundation (ODF) 4.14+
Red Hat Ceph Storage (RHCS) 6+
Issue
- CephFS VolumeSnapshot creation (and restore) takes a very long time to complete.
- Cephfs clone (from PVC) or Cephfs snapshot restore (from VolumeSnapshot) are left in Pending status.
Resolution
This solution focuses only on implementing changes which prevent this issue from occurring.
To identify and remove Orphaned Cephfs Subvolumes, follow this solution: Removing Orphaned Cephfs Subvolumes and Snapshots in ODF.
- Ensure that backups software (OADP, Velero, Kasten, Commvault etc.) has their timeout for Clone/Snapshot creation set to a high value.
- Current Subvolume clones are full copies of a snapshot and take an extremely long time to create, (hours).
- Fast Clones will be Copy-on-Write based and should be supported in Ceph 9.2.
- This is not a promise - stay in touch with your local Red Hatters, who can query Development on your behalf.
- Run the command below [2] to see how many CephFS Clones are
Pending. - For a Clone operation which is stuck, see ODF Clone operations stuck in 'pending'.
- This KCS Solution is offered for Clone operations which are stuck endlessly.
- Ensure that Concurrent Clones are set to a value which makes sense for the workload.
- The default value for Concurrent Clones is 4.
- If 6 Clones Creation tasks are running, 4 will execute and 2 will queue.
- This will only cause even more delays for certain backup jobs that are stuck in queue.
- Increase this parameter by 4 and observe how backups perform - Do not just jump to some large number.
- See steps below to change this parameter. [1]
- The default value for Concurrent Clones is 4.
- Ensure the Ceph OSDs are tuned properly.
- See How to tune Ceph OSDs using mClock
- Also ensure the CephFS Data Pool has enough PGs.
- Engage with Red Hat Tech Support as needed with either of these tunings suggestions.
[1]
$ ceph config dump | egrep "^WHO|clone"
WHO MASK LEVEL OPTION VALUE
{no output means this parameter is at the default value of 4}
$ ceph config set mgr mgr/volumes/max_concurrent_clones 8 ## Again, go up by 4 so 8,12,16,20
Trust, but verify:
$ ceph config dump | egrep "^WHO|clone"
WHO MASK LEVEL OPTION VALUE
mgr advanced mgr/volumes/max_concurrent_clones 8
[2]
sh-4.4$ for i in `ceph fs subvolume ls ocs-storagecluster-cephfilesystem csi --format json | jq '.[] | .name' | cut -f 2 -d '"'`; do echo "Subvolume : $i"; ceph fs clone status ocs-storagecluster-cephfilesystem $i csi; done
Root Cause
There can be several factors, most probably are the following two:
- Current Subvolume clones are full copies of a snapshot and take an extremely long time to create, (hours).
- The number of concurrent clones is too small for the backup workload.
- The OSD performance is low.
- The number of PGs for the CephFS Pools is too low.
Diagnostic Steps
-
In the logs of the csi-cephfsplugin-provisioner pod we can find errors like:
% oc logs csi-cephfsplugin-provisioner-597ffb4f96-w7b59 E0109 13:36:29.045691 1 utils.go:200] ID: 6685603 Req-ID: 0001-0011-openshift-storage-0000000000000001-f2989966-8549-11ed-bca5-0a580a81040c GRPC error: rpc error: code = FailedPrecondition desc = snapshot 0001-0011-openshift-storage-0000000000000001-f2989966-8549-11ed-bca5-0a580a81040c has pending clones or failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is pending -
We may find also OCP events like:
Normal PVCReconciled 4m VolumeSnapshotBackup-Controller performed created on PVC snapcontent-xxxx-xxxx-pvc Warning ProvisioningFailed 2m52s (x8 over 3m59s) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-xxxxx failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is pending Normal Provisioning 108s (x9 over 4m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-xxxxx External provisioner is provisioning volume for claim "openshift-adp/snapcontent-xxxx-xxxx-pvc" Warning ProvisioningFailed 108s openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-xxxxx failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Aborted desc = clone from snapshot is already in progress Normal ExternalProvisioning 14s (x17 over 4m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator
Unfortunately the events listed above look like an error at first glance. They are in-fact perfectly normal and do not indicate that a clone has failed to provision, but rather that the clone is still in the process of being created. This is seen only for cephfs clones because cephfs clones are full copies and take time.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.