Ceph - rbd image cannot be deleted with "rbd: error: image still has watchers"
Environment
- Red Hat Ceph Storage (RHCS)
- 4
- 5
Issue
Deleting and rbd image fails with the following error:
# rbd rm rbdtest/testimage1
2021-02-25 03:30:32.543 7f9be77fe700 -1 librbd::image::PreRemoveRequest: 0x56251d7efa10 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
Resolution
"rbd: error: image still has watchers" means there is still process on the client node that is using this image.
-
If it is possible to identify the process that is using the image, use following steps:
-
Get the watcher details:
# POOL=<pool> # IMAGE_NAME=<image_name> # rbd status ${POOL}/${IMAGE_NAME} Watchers: watcher=10.0.0.10:0/2445590874 client.4788 cookie=18446462598732840961 -
On the client '10.0.0.10' identify if the image is kernel module mapped - mapped by
rbd mapcommad:# lsblk | grep -e rbd -e NAME NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT rbd0 251:0 0 1G 0 disk -
If needed
umountit and thenunmapit:# RBD=rbd<ID> # rbd unmap /dev/${RBD} # lsblk | grep -e rbd -e NAME NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT # rbd unmap ${POOL}/${IMAGE_NAME} # lsblk | grep -e rbd -e NAME NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT -
Now there should not be any watchers and the rbd image can be removed
# rbd status ${POOL}/${IMAGE_NAME} Watchers: none # rbd rm ${POOL}/${IMAGE_NAME} Removing image: 100% complete...done.Note:
If the process cannot be identified, the client node can be blacklisted, BUT it will blacklist client I/O against this node for all rbd images and other Ceph client operations on this node.
This can be identified by listing the qemu processes - example form OpenStack environment:# ps -ef | grep qemu # sudo rbd status volumes/volume-2c799d1b-512f-45cc-bda2-2d054ad9975f Watchers: watcher=192.0.0.11:0/1065190154 client.110477 cookie=93911546382976 - from node '192.0.0.11' - in this case compute-0 node # ps -ef | grep qemu | grep volume-2c799d1b qemu 604262 74428 9 09:30 ? 00:02:44 /usr/libexec/qemu-kvm -name guest=instance-0000001f,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000001f/master-key.aes -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,hypervisor=on,tsc_adjust=on,clflushopt=on,pku=on,stibp=on,ssbd=on,ibpb=on -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 6eefa7e9-684b-4bbe-8d6e-fe1b6a2b384d -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=17.0.13-30.el7ost,serial=aebfb72f-3ce2-4758-aac1-5cb0f98db701,uuid=6eefa7e9-684b-4bbe-8d6e-fe1b6a2b384d,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=42,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=zrvDmuxIk5twh6GYBsF7wH5ECygP6IaTwcg/DhC4tqs=,keyid=masterKey0,iv=0qVp/k/L5t5BSbHgMDtmmw==,format=base64 -drive file=rbd:volumes/volume-2c799d1b-512f-45cc-bda2-2d054ad9975f:id=openstack:auth_supported=cephx\;none:mon_host=192.168.0.21\:6789\;192.168.0.65\:6789\;192.168.0.98\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,serial=2c799d1b-512f-45cc-bda2-2d054ad9975f,cache=writeback,discard=unmap -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on -netdev tap,fd=44,id=hostnet0,vhost=on,vhostfd=45 -device virtio-net-pci,rx_queue_size=512,host_mtu=1450,netdev=hostnet0,id=net0,mac=fa:16:3e:0a:e8:43,bus=pci.0,addr=0x3 -add-fd set=3,fd=48 -chardev pty,id=charserial0,logfile=/dev/fdset/3,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 192.0.0.64:4 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
-
-
If there is not possible to identify the process, the client node can be blacklisted, BUT it will blacklist client I/O against this node for all rbd images and other Ceph client operations on this node.
So use following steps with caution:-
Blacklist the watcher from 'rbd status' output:
# ceph osd blacklist add 10.0.0.10:0/2445590874 -
Check for the watchers, if none, remove the image:
# POOL=<pool> # IMAGE_NAME=<image_name> # rbd status ${POOL}/${IMAGE_NAME} Watchers: none # rbd rm ${POOL}/${IMAGE_NAME} Removing image: 100% complete...done. -
Remove the blacklist record, so the client can join the Ceph cluster again:
# ceph osd blacklist rm 10.0.0.10:0/2445590874
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.