Can I use HDD with ODF?
Environment
- Red Hat OpenShift Data Foundation 4.x
Issue
- Need to use ODF with HDD
- SSD is recognized as HDD
- Performance issues on ODF related with OSD class
- ODF is wrong about the disks recognized as HDD
- Is it required the rotational flag set to 0 (non-rotational) ?
Resolution
-
Inside the Article Supported configurations for Red Hat OpenShift Data Foundation 4.X there are 2 rows for HDD, one for Internal Mode and another for External.
HDD is not supported when using Internal Mode.
HDD is supported for Development or QA scenario, following the RHCS documentation when ODF is deployed using External Mode.
-
If you are using VMWare vSphere, please follow documentation details on the link: Infrastructure Requirements
-
When you have ODF Internal running under SSD but Ceph wrongly classified the disk as HDD, please follow:
- Change the rotational flag using Solution 6547891
- Override the ceph device class using Solution 7098573
After, check again the device class if they are all in SSD.
Root Cause
-
It does happen that ODF in versions prior to 4.16 mistakenly recognizes disks as HDD, but the environment is supported if is running over SSD.
-
The documentation is not clear about the device classes and the need to choose between HDD and SSD for a new cluster or when adding new devices.
-
ODF had always the support requirement of using ssd devices (non-rotational , which means rotational flag on the operating system file /sys/block/sdX/queue/rotational to 0) , but there was not constrain from ODF side to use rotational devices (HDD), ODF would let you continue to install the operator on rotational disks anyway
-
Since ODF 4.17, you can no longer create the ODF storagesystem on rotational disks, so the osd disks rotational flag /sys/block/sdX/queue/rotational (at OS kernel layer) must be 0 before installing ODF.
-
Since ODF 4.14 due to the following fix This content is not included.Mark disks as SSD for all internal deployments , some ceph internal default parameters (for example like
osd_recovery_sleep_hdd, see below) are overrided by thetuneFastDeviceClasswhen set totrue. For the reference here are the current ceph default values :sh-5.1$ ceph-conf -D|grep hdd bluestore_cache_size_hdd = 1073741824 bluestore_compression_max_blob_size_hdd = 65536 bluestore_compression_min_blob_size_hdd = 8192 bluestore_deferred_batch_ops_hdd = 64 bluestore_max_blob_size_hdd = 65536 bluestore_min_alloc_size_hdd = 4096 bluestore_prefer_deferred_size_hdd = 65536 bluestore_throttle_cost_per_io_hdd = 670000 osd_delete_sleep_hdd = 5.000000 osd_mclock_iops_capacity_low_threshold_hdd = 50.000000 osd_mclock_iops_capacity_threshold_hdd = 500.000000 osd_mclock_max_capacity_iops_hdd = 315.000000 osd_mclock_max_sequential_bandwidth_hdd = 157286400 osd_op_num_shards_hdd = 1 osd_op_num_threads_per_shard_hdd = 5 osd_recovery_max_active_hdd = 3 osd_recovery_sleep_hdd = 0.100000 osd_snap_trim_sleep_hdd = 5.000000 sh-5.1$ ceph-conf -D|grep ssd bluestore_cache_size_ssd = 3221225472 bluestore_compression_max_blob_size_ssd = 65536 bluestore_compression_min_blob_size_ssd = 65536 bluestore_deferred_batch_ops_ssd = 16 bluestore_max_blob_size_ssd = 65536 bluestore_min_alloc_size_ssd = 4096 bluestore_prefer_deferred_size_ssd = 0 bluestore_throttle_cost_per_io_ssd = 4000 osd_delete_sleep_ssd = 1.000000 osd_mclock_iops_capacity_low_threshold_ssd = 1000.000000 osd_mclock_iops_capacity_threshold_ssd = 80000.000000 osd_mclock_max_capacity_iops_ssd = 21500.000000 osd_mclock_max_sequential_bandwidth_ssd = 1258291200 osd_op_num_shards_ssd = 8 osd_op_num_threads_per_shard_ssd = 2 osd_recovery_max_active_ssd = 10 osd_recovery_sleep_ssd = 0.000000 osd_snap_trim_sleep_ssd = 0.000000 -
BUT there are still other parameters that depend of the rotational flag, like new mClock I/O scheduler and osd can report warnings like this (usually after osd is started or restart) :
2025-09-29T14:33:22.849024248Z debug 2025-09-29T14:33:22.848+0000 7f2b79224a40 0 log_channel(cluster) log [WRN] : OSD bench result of 24757.436948 IOPS is not within the threshold limit range of 50.000000 IOPS and 500.000000 IOPS for osd.9. IOPS capacity is unchanged at 315.000000 IOPS. The recommendation is to establish the osd's IOPS capacity using other benchmark tools (e.g. Fio) and then override osd_mclock_max_capacity_iops_[hdd|ssd].If not properly configured, this can cause issues like Ceph/ODF: Slow backfill and slow scrub/deep-scrub under mClock I/O scheduler
-
Engineering confirmed the requirement of the rotational flag , despite of having
tuneFastDeviceClassset totrue: This content is not included.[GSS][ODF 4.16.14] osd disk rotational flag /sys/block/sdX/queue/rotational requirement
Diagnostic Steps
-
If your environment is running business productive applications, you need to be sure that the device class is SSD to have the better performance and support.
-
Check the device class used per each OSD:
-
Enable rook-ceph-tools pod using the Solution 4628891
-
Below we can see a cluster with old devices using HDD and new devices added in SSD device class:
sh-5.1$ ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 3.00000 - 3 TiB 1.5 TiB 1.5 TiB 163 MiB 7.9 GiB 1.5 TiB 50.30 1.00 - root default
-12 1.00000 - 1 TiB 524 GiB 521 GiB 54 MiB 2.5 GiB 500 GiB 51.13 1.02 - rack rack0
-11 0.50000 - 512 GiB 98 GiB 98 GiB 2 KiB 217 MiB 414 GiB 19.24 0.38 - host ocs-deviceset-thin-0-data-a1b2c3
0 hdd 0.50000 1.00000 512 GiB 98 GiB 98 GiB 2 KiB 217 MiB 414 GiB 19.24 0.38 12 up osd.0
-28 0.50000 - 512 GiB 425 GiB 423 GiB 54 MiB 2.3 GiB 87 GiB 83.01 1.65 - host ocs-deviceset-thin-1-data-d4e5f6
5 ssd 0.50000 1.00000 512 GiB 425 GiB 423 GiB 54 MiB 2.3 GiB 87 GiB 83.01 1.65 175 up osd.5
-4 1.00000 - 1 TiB 515 GiB 513 GiB 54 MiB 2.7 GiB 509 GiB 50.34 1.00 - rack rack1
-22 0.50000 - 512 GiB 434 GiB 432 GiB 54 MiB 2.5 GiB 78 GiB 84.85 1.69 - host ocs-deviceset-thin-0-data-g7h8j9
3 ssd 0.50000 1.00000 512 GiB 434 GiB 432 GiB 54 MiB 2.5 GiB 78 GiB 84.85 1.69 176 up osd.3
-3 0.50000 - 512 GiB 81 GiB 81 GiB 1 KiB 181 MiB 431 GiB 15.83 0.31 - host ocs-deviceset-thin-1-data-k0l1m2
1 hdd 0.50000 1.00000 512 GiB 81 GiB 81 GiB 1 KiB 181 MiB 431 GiB 15.83 0.31 9 up osd.1
-8 1.00000 - 1 TiB 506 GiB 503 GiB 54 MiB 2.8 GiB 518 GiB 49.42 0.98 - rack rack2
-7 0.50000 - 512 GiB 71 GiB 71 GiB 3 KiB 272 MiB 441 GiB 13.85 0.28 - host ocs-deviceset-thin-2-data-n3o4p5
2 hdd 0.50000 1.00000 512 GiB 71 GiB 71 GiB 3 KiB 272 MiB 441 GiB 13.85 0.28 7 up osd.2
-25 0.50000 - 512 GiB 435 GiB 433 GiB 54 MiB 2.5 GiB 77 GiB 85.00 1.69 - host ocs-deviceset-thin-2-data-q6r7s8
4 ssd 0.50000 1.00000 512 GiB 435 GiB 433 GiB 54 MiB 2.5 GiB 77 GiB 85.00 1.69 176 up osd.4
TOTAL 3 TiB 1.5 TiB 1.5 TiB 163 MiB 7.9 GiB 1.5 TiB 50.30
MIN/MAX VAR: 0.28/1.69 STDDEV: 34.04
NOTE. In the scenario above the data will not be distributed from HDD device class with SSD device class since the PGs will be kept on HDD only.
-
Check what is the current value of
tuneFastDeviceClasson your cephcluster :# oc get cephcluster ocs-storagecluster-cephcluster -o yaml|grep tuneFastDeviceClass tuneFastDeviceClass: true OR in an ODF must-gather: $ grep -inr tuneFastDeviceClass * ceph/namespaces/openshift-storage/ceph.rook.io/cephclusters/odf-storagecluster-cephcluster.yaml:420: tuneFastDeviceClass: true -
Check what is the current value of the rotational flag detected by ceph, from the ceph toolbox pod:
# ceph report|grep rotational "bluestore_bdev_rotational": "1", "journal_rotational": "1", "rotational": "1" "bluestore_bdev_rotational": "1", "journal_rotational": "1", "rotational": "1" "bluestore_bdev_rotational": "1", "journal_rotational": "1", "rotational": "1" -
Check if ceph internal values (shown in Root Cause section) are overriden in the osds config, here an example with osd.0 and some parameters, from the ceph toolbox pod:
# ceph config show osd.0|grep _hdd bluestore_prefer_deferred_size_hdd 0 file osd_delete_sleep_hdd 0.000000 override osd_recovery_max_active_hdd 3 default osd_recovery_sleep_hdd 0.000000 override osd_snap_trim_sleep_hdd 0.000000 override OR from an ODF must-gather, under directory ceph/must_gather_commands : $ grep osd_recovery_sleep_hdd * config_osd.0:osd_recovery_sleep_hdd 0.000000 override config_osd.1:osd_recovery_sleep_hdd 0.000000 override config_osd.2:osd_recovery_sleep_hdd 0.000000 override
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.