ODF Clone operations stuck in 'pending'
Environment
- OCS/ODF 4.x
Issue
- ODF Clone operations (creates, deletes, etc.) are stuck perpetually in a 'pending' state
Resolution
-
Get the mon IPs:
When using monitor port 6789:# oc get cm/rook-ceph-csi-config -o=go-template='{{index .data "csi-cluster-config-json"}}' | jq -r '.[0].monitors | join(",")' 172.30.185.38:6789,172.30.101.171:6789,172.30.227.154:6789
When using secure monitor ports 3300:
```
# oc get cm/rook-ceph-csi-config -o=go-template='{{index .data "csi-cluster-config-json"}}' | jq -r '.[0].monitors | join(",")'
172.30.185.38:3300,172.30.101.171:3300,172.30.227.154:3300
```
-
Get the admin key:
# oc get secret/rook-ceph-mon -o=go-template='{{index .data "ceph-secret" | base64decode }}' AQCIL/FjGrzbAhAAyNu5VnRhSB1PJ426jf7lEQ== -
Access worker node terminal chroot /host create a temporary mount point:
# mkdir /mnt/ceph-cleanup -
Then use the values obtained in steps 1 and 2 to mount the
/volumesfile system:
When using monitor port 6789:# mount -t ceph <mon-ips>:/ /mnt/ceph-cleanup -o name=admin,secret='<secret>'
When using secure monitor ports 3300:
```
# mount -t ceph <mon-ips>:/ /mnt/ceph-cleanup -o name=admin,secret='<secret>',ms_mode=secure
```
-
List the contents of
/mnt/ceph-cleanup/volumes/_index/cloneand remove the symlinks matching the missing entries (please note that these volume identifiers will be different in your configuration):# ls -l /mnt/ceph-cleanup/volumes/_index/clone 8809690a-7b48-4833-a21a-9cee46e47390 8809690a-7b48-4833-a21a-9cee46e47391 8809690a-7b48-4833-a21a-9cee46e47393 21936f6a-3ab5-47c8-a57f-aaf1ef02d77bFor all of the above volumes, proceed with the following steps:
-
Check that the symlink does not point to an existing
clonein the volumes directory. The output ofls -lwill show the path symlink points to (the path after->). We have mounted the cephfs filesystem in/mnt/ceph-cleanup, in this case we have to check the existence of the directory with/mnt/ceph-cleanupprefixed:# ls -l /mnt/ceph-cleanup/volumes/_index/clone/8809690a-7b48-4833-a21a-9cee46e47390 lrwxrwxrwx 1 root root 31 Jul 16 15:16 /mnt/volumes/_index/clone/8809690a-7b48-4833-a21a-9cee46e47390 -> /volumes/_nogroup/clone_sub_0_0 # stat /mnt/ceph-cleanup/volumes/_nogroup/clone_sub_0_0 stat: cannot statx '/mnt/ceph-cleanup/volumes/_nogroup/clone_sub_0_0': No such file or directory⚠ Only if you see the
No such file or directoryas indicated above proceed with the step to delete the symlink:# rm -vf /mnt/ceph-cleanup/volumes/_index/clone/8809690a-7b48-4833-a21a-9cee46e47390 removed '/mnt/cephfs/volumes/_index/clone/8809690a-7b48-4833-a21a-9cee46e47390 -
If the above
statoutput does exist, it will print something similar to, in this case do not proceed with deleting the symlink:# stat /mnt/ceph-cleanup/volumes/_nogroup/clone_sub_0_0 File: /mnt/ceph-cleanup/volumes/_nogroup/clone_sub_0_0 Size: 2 Blocks: 0 IO Block: 65536 directory Device: 99h/153d Inode: 1099511629313 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:cephfs_t:s0 Access: 2023-09-14 12:07:13.643604709 -0400 Modify: 2023-09-14 12:07:18.264679949 -0400 Change: 2023-09-14 12:07:18.264679949 -0400 Birth: 2023-09-14 12:07:13.643604709 -0400When the clone directory still exist, then please proceed using the
ceph fs clone statusandceph fs clone cancelcommands to cancel the clone operation. If this was successful, you can then proceed to removing the subvolume with theceph fs subvolume rmcommand using the--forceparameter.
-
Root Cause
The clone is failing because the base path of the clone doesn't exist anymore. This subsequent result is that the in-progress clone is deleted forcibly.
The clone operation tracks the on going clones using a symlink to destination clone path in the /volumes/_index/clone directory. Once the clone is completed, the symlink is removed. In this case, while the clone is in progress, the clone subvolume is being removed forcibly. The clone removal operation has left the symlink but removed the clone.
Diagnostic Steps
- Describe the volumesnapshot's associated volumesnapshotcontent Custom Resource:
$ oc describe volumesnapshotcontent/<volumesnapshot-content> |grep -i warning
Warning SnapshotDeleteError 2m45s (x58 over 3h11m) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
Warning SnapshotDeleteError 80s (x614 over 37h) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
Warning SnapshotDeleteError 2m40s (x998 over 2d13h) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
Warning SnapshotDeleteError 2m39s (x1190 over 3d1h) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
Warning SnapshotDeleteError 2m39s (x1576 over 4d2h) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
- Get the csi-vol-
of the PVC that's being cloned:
# oc get pv -o 'custom-columns=NAME:.spec.claimRef.name,PVNAME:.metadata.name,STORAGECLASS:.spec.storageClassName,IMAGENAME:.spec.csi.volumeAttributes.subvolumeName' | grep < pvc-name >
- Get info on the csi-vol-< suffix-id > snapshots ( To be ran from the rook-ceph-tools pod )
sh-4.4$ ceph fs subvolume snapshot ls ocs-storagecluster-cephfilesystem csi-vol-613a3c90-b22c-11ea-8f58-0a580ae0101a csi
[
{
"name": "csi-snap-09f2c07f-aeca-11ed-b704-0a580ae0160c"
},
{
"name": "csi-snap-2991ff43-af30-11ed-b704-0a580ae0160c"
- Inspect the snapshot info in the rook-ceph-tools pod:
$ ceph fs subvolume snapshot info ocs-storagecluster-cephfilesystem csi-vol-613a3c90-b22c-11ea-8f58-0a580ae0101a csi-snap-a34f2252-b18b-11ed-b704-0a580ae0160c csi
{
"created_at": "2023-02-21 02:01:21.696892",
"data_pool": "ocs-storagecluster-cephfilesystem-data0",
"has_pending_clones": "yes", <<<---**
"size": 1636134356447
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.