ODF deployment fails to deploy OSDs due to unclean disks - How to properly erase a disk which was an ODF / Ceph OSD.

Solution Verified - Updated

Environment

Red Hat Ceph Storage (RHCS) 5+
Red Hat OpenShift Container Storage (OCS) 4
Red Hat OpenShift Data Foundation (ODF) 4

Issue

  • ODF deployment fails to deploy OSDs due to unclean disks.
  • How to properly erase a disk which was an ODF / Ceph OSD.
  • OSD is not cleaning properly while re-deploying the OSD.

Resolution

  • There are 2 methods to clean/wipe disks to be used for ODF OSDs.
    • Method 1 is intended for OCP / ODF clusters running 4.16 or higher.
    • Method 2 is intended for OCP / ODF clusters running 4.15 or lower.
    • Method 2 is also intended as a fall back if method 1 fails.

Method 1

You can clean a device for use in ODF by running the ceph-bluestore-tool from the Rook Ceph Operator container image.

  1. Identify the Rook Ceph Operator Image.

Identify the image of the running rook-ceph-operator deployment:

Syntax: # oc get deploy -n openshift-storage rook-ceph-operator -o jsonpath='{.spec.template.spec.containers[].image}'

Example: # oc get deploy -n openshift-storage rook-ceph-operator -o jsonpath='{.spec.template.spec.containers[].image}'
registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:60623084100a785f7eaf4bd3a4f87de4982daa37e30849bdb0b8304d3d1b7cd5
  1. Cleaning the device.

On the node, use podman to run the ceph-bluestore-tool utilizing the operator's image to wipe the device:

Syntax: # /usr/bin/podman run --authfile /var/lib/kubelet/config.json --rm -ti --privileged --device <device-path> --entrypoint ceph-bluestore-tool <rook-ceph-operator-image> zap-device --dev <device-path> --yes-i-really-really-mean-it

Examples:
# /usr/bin/podman run --authfile /var/lib/kubelet/config.json --rm -ti --privileged --device /dev/sdc --entrypoint ceph-bluestore-tool registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:60623084100a785f7eaf4bd3a4f87de4982daa37e30849bdb0b8304d3d1b7cd5 zap-device --dev /dev/sdc --yes-i-really-really-mean-it

# /usr/bin/podman run --authfile /var/lib/kubelet/config.json --rm -ti --privileged --device /dev/nvme1n1 --entrypoint ceph-bluestore-tool registry.redhat.io/odf4/rook-ceph-rhel9-operator@sha256:60623084100a785f7eaf4bd3a4f87de4982daa37e30849bdb0b8304d3d1b7cd5 zap-device --dev /dev/nvme1n1 --yes-i-really-really-mean-it
  • Replace <device-path> with the path to the target disk (e.g., /dev/nvme1n1).
  • Replace <rook-ceph-operator-image> with the image you found in step 1.

Method 2

This procedure is specific to Downstream ODF (Rook) which uses CoreOS as the Platform Operating System.

DISK="/dev/sdX"
DISK="/dev/nvme1n1"

# Wipe the partition table off the disk to a fresh, usable state.
wipefs -fa $DISK

# Wipe certain areas of the disk to remove Ceph Metadata which may be present.
for gb in 0 1 10 100 1000; do dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((gb * 1024**2)); done

# This might not be supported on all devices.
blkdiscard $DISK

  • In OpenShift environment, the This content is not included.solutions 7114870 can be used as a workaround where a MachineConfig configures a script to run during the first node boot to clean the Ceph bluestore metadata from all disks.

  • In order to cleanup ceph bluestore metadata by ODF Operator from OSD disks before deploying the cluster, a new feature is requested and being tracked as part of This content is not included.ODFRFE-19.

Root Cause

Diagnostic Steps

  • In RHODF environment, after zapping the disk using the commands - "sgdisk --zap-all /dev/<disk> && wipefs -af /dev/<disk>", the osd is not starting and its reporting the below error while starting:
sha256-0cc6d999e5e52bfe425b80493657ffd973e8d8729faf520d2217e6ccef6c08ca/namespaces/openshift-storage/pods/rook-ceph-osd-3-6cf5c9757f-ffq2f/expand-bluefs/expand-bluefs/logs/current.log
2025-04-01T07:19:54.918623613Z inferring bluefs devices from bluestore path
2025-04-01T07:19:54.919354059Z 2025-04-01T07:19:54.918+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label /var/lib/ceph/osd/ceph-3/block data at 0, unable to decode label
2025-04-01T07:19:55.190101707Z 2025-04-01T07:19:55.189+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3/block) _read_bdev_label /var/lib/ceph/osd/ceph-3/block data at 0, unable to decode label
2025-04-01T07:19:55.190313374Z 2025-04-01T07:19:55.189+0000 7f65f1cffa40 -1 bluestore(/var/lib/ceph/osd/ceph-3) _check_main_bdev_label not all labels read properly
2025-04-01T07:19:55.458628292Z /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f65f1cffa40 time 2025-04-01T07:19:55.458515+0000
2025-04-01T07:19:55.458628292Z /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: 8837: FAILED ceph_assert(r == 0)
2025-04-01T07:19:55.459052905Z  ceph version 19.2.0-98.el9cp (d5c3cf625491b0bd76b4585e77aa0d907446f314) squid (stable)
2025-04-01T07:19:55.459052905Z  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x7f65f2ec2cb0]
2025-04-01T07:19:55.459052905Z  2: /usr/lib64/ceph/libceph-common.so.2(+0x182e6f) [0x7f65f2ec2e6f]
2025-04-01T07:19:55.459052905Z  3: (BlueStore::expand_devices(std::ostream&)+0xa43) [0x55a10722c023]
2025-04-01T07:19:55.459052905Z  4: main()
2025-04-01T07:19:55.459052905Z  5: /lib64/libc.so.6(+0x295d0) [0x7f65f28375d0]
2025-04-01T07:19:55.459052905Z  6: __libc_start_main()
2025-04-01T07:19:55.459052905Z  7: _start()
2025-04-01T07:19:55.459052905Z *** Caught signal (Aborted) **
2025-04-01T07:19:55.459052905Z  in thread 7f65f1cffa40 thread_name:ceph-bluestore-
2025-04-01T07:19:55.459069615Z 2025-04-01T07:19:55.458+0000 7f65f1cffa40 -1 /builddir/build/BUILD/ceph-19.2.0/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::expand_devices(std::ostream&)' thread 7f65f1cffa40 time
SBR
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.