Can I use HDD with ODF?

Solution Verified - Updated

Environment

  • Red Hat OpenShift Data Foundation 4.x

Issue

  • Need to use ODF with HDD
  • SSD is recognized as HDD
  • Performance issues on ODF related with OSD class
  • ODF is wrong about the disks recognized as HDD
  • Is it required the rotational flag set to 0 (non-rotational) ?

Resolution

  • Inside the Article Supported configurations for Red Hat OpenShift Data Foundation 4.X there are 2 rows for HDD, one for Internal Mode and another for External.

    HDD is not supported when using Internal Mode.

    HDD is supported for Development or QA scenario, following the RHCS documentation when ODF is deployed using External Mode.

  • If you are using VMWare vSphere, please follow documentation details on the link: Infrastructure Requirements

  • When you have ODF Internal running under SSD but Ceph wrongly classified the disk as HDD, please follow:

  1. Change the rotational flag using Solution 6547891
  2. Override the ceph device class using Solution 7098573

After, check again the device class if they are all in SSD.

Root Cause

  • It does happen that ODF in versions prior to 4.16 mistakenly recognizes disks as HDD, but the environment is supported if is running over SSD.

  • The documentation is not clear about the device classes and the need to choose between HDD and SSD for a new cluster or when adding new devices.

  • ODF had always the support requirement of using ssd devices (non-rotational , which means rotational flag on the operating system file /sys/block/sdX/queue/rotational to 0) , but there was not constrain from ODF side to use rotational devices (HDD), ODF would let you continue to install the operator on rotational disks anyway

  • Since ODF 4.17, you can no longer create the ODF storagesystem on rotational disks, so the osd disks rotational flag /sys/block/sdX/queue/rotational (at OS kernel layer) must be 0 before installing ODF.

  • Since ODF 4.14 due to the following fix This content is not included.Mark disks as SSD for all internal deployments , some ceph internal default parameters (for example like osd_recovery_sleep_hdd, see below) are overrided by the tuneFastDeviceClass when set to true . For the reference here are the current ceph default values :

    sh-5.1$ ceph-conf -D|grep hdd
    bluestore_cache_size_hdd = 1073741824
    bluestore_compression_max_blob_size_hdd = 65536
    bluestore_compression_min_blob_size_hdd = 8192
    bluestore_deferred_batch_ops_hdd = 64
    bluestore_max_blob_size_hdd = 65536
    bluestore_min_alloc_size_hdd = 4096
    bluestore_prefer_deferred_size_hdd = 65536
    bluestore_throttle_cost_per_io_hdd = 670000
    osd_delete_sleep_hdd = 5.000000
    osd_mclock_iops_capacity_low_threshold_hdd = 50.000000
    osd_mclock_iops_capacity_threshold_hdd = 500.000000
    osd_mclock_max_capacity_iops_hdd = 315.000000
    osd_mclock_max_sequential_bandwidth_hdd = 157286400
    osd_op_num_shards_hdd = 1
    osd_op_num_threads_per_shard_hdd = 5
    osd_recovery_max_active_hdd = 3
    osd_recovery_sleep_hdd = 0.100000
    osd_snap_trim_sleep_hdd = 5.000000
    
    sh-5.1$ ceph-conf -D|grep ssd
    bluestore_cache_size_ssd = 3221225472
    bluestore_compression_max_blob_size_ssd = 65536
    bluestore_compression_min_blob_size_ssd = 65536
    bluestore_deferred_batch_ops_ssd = 16
    bluestore_max_blob_size_ssd = 65536
    bluestore_min_alloc_size_ssd = 4096
    bluestore_prefer_deferred_size_ssd = 0
    bluestore_throttle_cost_per_io_ssd = 4000
    osd_delete_sleep_ssd = 1.000000
    osd_mclock_iops_capacity_low_threshold_ssd = 1000.000000
    osd_mclock_iops_capacity_threshold_ssd = 80000.000000
    osd_mclock_max_capacity_iops_ssd = 21500.000000
    osd_mclock_max_sequential_bandwidth_ssd = 1258291200
    osd_op_num_shards_ssd = 8
    osd_op_num_threads_per_shard_ssd = 2
    osd_recovery_max_active_ssd = 10
    osd_recovery_sleep_ssd = 0.000000
    osd_snap_trim_sleep_ssd = 0.000000 
    
  • BUT there are still other parameters that depend of the rotational flag, like new mClock I/O scheduler and osd can report warnings like this (usually after osd is started or restart) :

    2025-09-29T14:33:22.849024248Z debug 2025-09-29T14:33:22.848+0000 7f2b79224a40  0 log_channel(cluster) log [WRN] : OSD bench result of 24757.436948 IOPS is not within the threshold limit range of 50.000000 IOPS and 500.000000 IOPS for osd.9. IOPS capacity is unchanged at 315.000000 IOPS. The recommendation is to establish the osd's IOPS capacity using other benchmark tools (e.g. Fio) and then override osd_mclock_max_capacity_iops_[hdd|ssd].
    

    If not properly configured, this can cause issues like Ceph/ODF: Slow backfill and slow scrub/deep-scrub under mClock I/O scheduler

  • Engineering confirmed the requirement of the rotational flag , despite of having tuneFastDeviceClass set to true : This content is not included.[GSS][ODF 4.16.14] osd disk rotational flag /sys/block/sdX/queue/rotational requirement

Diagnostic Steps

  • If your environment is running business productive applications, you need to be sure that the device class is SSD to have the better performance and support.

  • Check the device class used per each OSD:

  1. Enable rook-ceph-tools pod using the Solution 4628891

  2. Below we can see a cluster with old devices using HDD and new devices added in SSD device class:

sh-5.1$ ceph osd df tree
ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
 -1         3.00000         -    3 TiB  1.5 TiB  1.5 TiB  163 MiB  7.9 GiB  1.5 TiB  50.30  1.00    -          root default
-12         1.00000         -    1 TiB  524 GiB  521 GiB   54 MiB  2.5 GiB  500 GiB  51.13  1.02    -              rack rack0
-11         0.50000         -  512 GiB   98 GiB   98 GiB    2 KiB  217 MiB  414 GiB  19.24  0.38    -                  host ocs-deviceset-thin-0-data-a1b2c3
  0    hdd  0.50000   1.00000  512 GiB   98 GiB   98 GiB    2 KiB  217 MiB  414 GiB  19.24  0.38   12      up              osd.0
-28         0.50000         -  512 GiB  425 GiB  423 GiB   54 MiB  2.3 GiB   87 GiB  83.01  1.65    -                  host ocs-deviceset-thin-1-data-d4e5f6
  5    ssd  0.50000   1.00000  512 GiB  425 GiB  423 GiB   54 MiB  2.3 GiB   87 GiB  83.01  1.65  175      up              osd.5
 -4         1.00000         -    1 TiB  515 GiB  513 GiB   54 MiB  2.7 GiB  509 GiB  50.34  1.00    -              rack rack1
-22         0.50000         -  512 GiB  434 GiB  432 GiB   54 MiB  2.5 GiB   78 GiB  84.85  1.69    -                  host ocs-deviceset-thin-0-data-g7h8j9
  3    ssd  0.50000   1.00000  512 GiB  434 GiB  432 GiB   54 MiB  2.5 GiB   78 GiB  84.85  1.69  176      up              osd.3
 -3         0.50000         -  512 GiB   81 GiB   81 GiB    1 KiB  181 MiB  431 GiB  15.83  0.31    -                  host ocs-deviceset-thin-1-data-k0l1m2
  1    hdd  0.50000   1.00000  512 GiB   81 GiB   81 GiB    1 KiB  181 MiB  431 GiB  15.83  0.31    9      up              osd.1
 -8         1.00000         -    1 TiB  506 GiB  503 GiB   54 MiB  2.8 GiB  518 GiB  49.42  0.98    -              rack rack2
 -7         0.50000         -  512 GiB   71 GiB   71 GiB    3 KiB  272 MiB  441 GiB  13.85  0.28    -                  host ocs-deviceset-thin-2-data-n3o4p5
  2    hdd  0.50000   1.00000  512 GiB   71 GiB   71 GiB    3 KiB  272 MiB  441 GiB  13.85  0.28    7      up              osd.2
-25         0.50000         -  512 GiB  435 GiB  433 GiB   54 MiB  2.5 GiB   77 GiB  85.00  1.69    -                  host ocs-deviceset-thin-2-data-q6r7s8
  4    ssd  0.50000   1.00000  512 GiB  435 GiB  433 GiB   54 MiB  2.5 GiB   77 GiB  85.00  1.69  176      up              osd.4
                        TOTAL    3 TiB  1.5 TiB  1.5 TiB  163 MiB  7.9 GiB  1.5 TiB  50.30
MIN/MAX VAR: 0.28/1.69  STDDEV: 34.04

NOTE. In the scenario above the data will not be distributed from HDD device class with SSD device class since the PGs will be kept on HDD only.

  • Check what is the current value of tuneFastDeviceClass on your cephcluster :

      # oc get cephcluster ocs-storagecluster-cephcluster -o yaml|grep tuneFastDeviceClass
          tuneFastDeviceClass: true
    
      OR in an ODF must-gather:
      
      $ grep -inr tuneFastDeviceClass *
      ceph/namespaces/openshift-storage/ceph.rook.io/cephclusters/odf-storagecluster-cephcluster.yaml:420:      tuneFastDeviceClass: true
      
    
  • Check what is the current value of the rotational flag detected by ceph, from the ceph toolbox pod:

    # ceph report|grep rotational
              "bluestore_bdev_rotational": "1",
              "journal_rotational": "1",
              "rotational": "1"
              "bluestore_bdev_rotational": "1",
              "journal_rotational": "1",
              "rotational": "1"
              "bluestore_bdev_rotational": "1",
              "journal_rotational": "1",
              "rotational": "1"
    
    
  • Check if ceph internal values (shown in Root Cause section) are overriden in the osds config, here an example with osd.0 and some parameters, from the ceph toolbox pod:

    # ceph config show osd.0|grep _hdd
    bluestore_prefer_deferred_size_hdd            0               file
    osd_delete_sleep_hdd                          0.000000        override
    osd_recovery_max_active_hdd                   3               default
    osd_recovery_sleep_hdd                        0.000000        override
    osd_snap_trim_sleep_hdd                       0.000000        override   
    
    OR from an ODF must-gather, under directory ceph/must_gather_commands :
    
    $ grep osd_recovery_sleep_hdd *
    config_osd.0:osd_recovery_sleep_hdd     0.000000        override
    config_osd.1:osd_recovery_sleep_hdd     0.000000        override
    config_osd.2:osd_recovery_sleep_hdd     0.000000        override
    
    
SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.