ODF: Changes in crush rules due to device class changes results in PGs being in an unknown, misplaced, and/or remapped state

Solution Verified - Updated

Environment

Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x

Issue

  • Pools are utilizing a crush rule that references a device class that is not currently utilized in the cluster, this results in PGs being in an unknown, misplaced, and/or remapped state
  • More than 100 percent of objects are misplaced in Ceph after upgrading the OpenShift Data Foundations operator to 4.16.x
  • Rook created crush rules to utilize a device class that is not used in the cluster; these crush rules are now applied to pools, causing issues
  • The defaultCephDeviceClass in the StorageCluster CR is incorrect
  • Transitioned from unsupported HDDs to SSDs in ODF, resulting in data unavailability

Example:
Ceph status shows a large amount of objects misplaced

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph status -c /var/lib/rook/openshift-storage/openshift-storage.config

  cluster:
    id:     [REDACTED]
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,d (age 7d)
    mgr: b(active, since 7d), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 6 osds: 6 up (since 7d), 6 in (since 4M); 169 remapped pgs
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 0/1 healthy, 1 recovering
    pools:   12 pools, 281 pgs
    objects: 184.82k objects, 265 GiB
    usage:   1.2 TiB used, 17 TiB / 18 TiB avail
    pgs:     33.808% pgs unknown
             886324/554472 objects misplaced (159.850%)
             138 active+clean+remapped
             95  unknown
             48  active+undersized+remapped

All OSDs are utilizing the device class ssd

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd tree -c /var/lib/rook/openshift-storage/openshift-storage.config

ID  CLASS  WEIGHT   TYPE NAME                                         STATUS  REWEIGHT  PRI-AFF
-1         5.85956  root default                                                               
-5         1.95319      host [REDACTED]                           
 1    ssd  0.97659          osd.1                                         up   1.00000  1.00000
 5    ssd  0.97659          osd.5                                         up   1.00000  1.00000
-7         1.95319      host [REDACTED]                           
 2    ssd  0.97659          osd.2                                         up   1.00000  1.00000
 4    ssd  0.97659          osd.4                                         up   1.00000  1.00000
-3         1.95319      host [REDACTED]                           
 0    ssd  0.97659          osd.0                                         up   1.00000  1.00000
 3    ssd  0.97659          osd.3                                         up   1.00000  1.00000

The pools are utilizing a crush rule that references hdd while there are only ssds in the cluster

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd pool ls detail -c /var/lib/rook/openshift-storage/openshift-storage.config

pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 25 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9437 lfor 0/0/30 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd

The crush rules being utilized references hdd while there are only ssds in the cluster

$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd crush rule dump -c /var/lib/rook/openshift-storage/openshift-storage.config

    {
        "rule_id": 25,
        "rule_name": "ocs-storagecluster-cephblockpool_host_hdd",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -2,
                "item_name": "default~hdd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },

Resolution

For the Ceph commands, use KCS Configuring the Rook-Ceph Toolbox in OpenShift Data Foundation 4.x or KCS Accessing the Red Hat Ceph Storage CLI in OpenShift Data Foundation 4.x to access the Ceph CLI.

The following steps involve deletion of crush rules and crush classes. It is recommended to open a support case with Red Hat prior to applying this solution to ensure minimal disruptions to your environment.

  1. Access Ceph Tools and make note of HDD OSDs:
$ ceph osd df | grep hdd 
  1. Switch to the openshift-storage project and scale down the rook-ceph-operator and ocs-operator:
$ oc project openshift-storage; oc scale deployment rook-ceph-operator ocs-operator --replicas=0
  1. Set the flags nobackfill, norecover, and norebalance
$ ceph osd set nobackfill
$ ceph osd set norecover
$ ceph osd set norebalance

NOTE: From the step below until the latest step the data will be unavailable for read and write

  1. Change all pools to utilize an generic Crush Rule which does not reference a device class, hdd or ssd.
    In this case we are utilizing the crush rule replicated_rule as it doesn't have a device class specified.
    NOTE: Do not use the the for loop if you have custom pools with custom crush rules
$ ceph osd crush rule ls
$ for i in $(ceph osd pool ls); do ceph osd pool set $i crush_rule replicated_rule; done
  1. Modify any OSDs reporting as HDD to SSD:
$ for i in $(ceph osd ls); do echo osd.$i; ceph osd crush rm-device-class osd.$i; ceph osd crush set-device-class ssd osd.$i; done
  1. Attempt to delete the hdd crush class NOTE: This fails as we have crush rules that reference hdd, make note of that list
$ ceph osd crush class ls
$ ceph osd crush class rm hdd
  1. Delete the old crush rules one by one that reference hdd (Utilize the rules noted in the previous step)
$ ceph osd crush rule rm <crush_rules_hdd>
  1. Remove the old crush class
$ ceph osd crush class rm hdd
  1. Remove any reference of HDD in the StorageCluster CR
$ oc patch storageclusters.ocs.openshift.io ocs-storagecluster -n openshift-storage --type=json --subresource=status --patch '[{"op": "remove", "path": "/status/defaultCephDeviceClass"}]'

$ oc patch cephclusters.ceph.rook.io ocs-storagecluster-cephcluster -n openshift-storage --type=json --subresource=status --patch '[{"op": "remove", "path": "/status/storage/deviceClasses"}]'

Check reference was removed:

$ oc get storageclusters.ocs.openshift.io ocs-storagecluster -n openshift-storage -o yaml |grep -i deviceClass
$ oc get cephclusters.ceph.rook.io ocs-storagecluster-cephcluster -n openshift-storage -o yaml |grep -i deviceClass

  1. Scale up the rook-ceph-operator and ocs-operator
$ oc scale deployment rook-ceph-operator ocs-operator --replicas=1
  1. Wait five to ten minutes to allow the operators time to reconcile and create our crush rules
$ sleep 300
  1. Verify new rules have been created in Ceph that reference the device class ssd and the pools are utilizing these new crush rules
$ ceph osd pool ls detail 
$ ceph osd crush rule dump
  1. Unset the flags nobackfill, norecover, and norebalance
$ ceph osd unset nobackfill
$ ceph osd unset norecover
$ ceph osd unset norebalance
  1. Verify the health of Ceph
$ ceph status

Root Cause

When upgrading the OpenShift Data Foundations operator to 4.16.x, new crush rules are created for the Ceph pools. These new crush rules are device class specific. The pools are now utilizing a crush rule with a device class that is not being utilized in the cluster, resulting in PGs being in an unknown, misplaced, and/or remapped state

The default behavior of Rook when creating these crush rules is to choose the first item in the list from the command $ ceph osd crush class ls. This causes issues as customers who have transitioned from the unsupported HDDs to SSDs may still have HDDs in their list.

This issue is currently being tracked through a Jira.

Artifacts

Product/VersionRelated BZ/JiraErrataFixed Version
ODF/4.19Jira This content is not included.DFBUGS-948Errata N/A4.19.0
ODF/4.18Jira This content is not included.DFBUGS-1666Errata N/A4.18.1
ODF/4.17Jira This content is not included.DFBUGS-1667Errata RHSA-2025:171454.17.14
ODF/4.16Jira This content is not included.DFBUGS-1668Errata RHBA-2025:171574.16.16

Diagnostic Steps

  • The Ceph command $ ceph osd crush class ls has two items in the list, with ssd not being the first in the list
$ oc exec -it $(oc get pod -n openshift-storage -l app=rook-ceph-operator -o name) -n openshift-storage -- ceph osd crush class ls -c /var/lib/rook/openshift-storage/openshift-storage.config

[
    "hdd",
    "ssd"
]
  • The storagecluster CR has hdd as the defaultCephDeviceClass
$ oc get storagecluster -o json | jq '.items[].status.defaultCephDeviceClass'

"hdd"
  • The CephCluster CR has the device class hdd and ssd in the deviceClasses section. Notably hdd is listed first
$ oc get cephcluster -o json | jq '.items[].status.storage.deviceClasses

[
  {
    "name": "hdd"
    "name": "ssd"
  }
]
  • In case no new _ssd rules are created by the rook-ceph-operator check logs of rook-ceph-operator pod :
# oc logs rook-ceph-operator-xxxxx

2026-02-20T15:44:34.033966152Z 2026-02-20 15:44:34.033887 I | cephclient: creating a new crush rule for changed deviceClass ("default"-->"hdd") on crush rule "replicated_rule"
2026-02-20T15:44:34.033966152Z 2026-02-20 15:44:34.033920 I | cephclient: updating pool "ocs-storagecluster-cephblockpool" failure domain from "host" to "host" with new crush rule "ocs-storagecluster-cephblockpool_host_hdd"
2026-02-20T15:44:34.033966152Z 2026-02-20 15:44:34.033930 I | cephclient: crush rule "replicated_rule" will no longer be used by pool "ocs-storagecluster-cephblockpool"
2026-02-20T15:44:34.495930182Z 2026-02-20 15:44:34.495871 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "openshift-storage/ocs-storagecluster-cephblockpool". failed to create pool "ocs-storagecluster-cephblockpool".: failed to configure pool "ocs-storagecluster-cephblockpool".: failed to configure pool "ocs-storagecluster-cephblockpool": failed to update crush rule for pool "ocs-storagecluster-cephblockpool": failed to create replicated crush rule "ocs-storagecluster-cephblockpool_host_hdd": failed to create crush rule ocs-storagecluster-cephblockpool_host_hdd. . Error EINVAL: device class hdd does not exist: exit status 22

On this example no new _ssd rules are created and reports. In that case , repeat steps to scale down operators and complete step 9 to remove any reference of HDD in the StorageCluster

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.