What are the kernel parameters related to maximum size of physical I/O requests?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8

Issue

  • What is the kernel parameters related to maximum size of physical I/O requests?
  • How do I adjust the maximum I/O transfer size on RHEL?
  • How do I set the maximum block device I/O transfer size on RHEL?
  • How to set custom values for /sys/block/<device>/queue/max_sectors_kb parameters?
  • Default max_sectors_kb setting
  • Do you have a proper way (reboot persistent, applies to current disks (devices), applies to disks (devices) that are added before next reboot) to change the default max_sectors_kb?

Resolution

Linux has two parameters in the /sys file system related to the configuration: max_hw_sectors_kb and max_sectors_kb.

max_hw_sectors_kb (read-only)

This is the maximum number of kilobytes supported in a single data transfer by the underlying device. This value is read-only. It is set by the driver to reflect the driver/hardware limit. The block layer will also enforce this limit and so it will take the minimum of the max_hw_sectors_kb and the kernel default block limit (512kb in RHEL4/5) to make sure all I/O requests are within the size limit that the hardware/driver can support.

max_sectors_kb (read/write)

This is the maximum number of kilobytes that the block layer will allow for a filesystem request. This value could be overwritten, but it must be smaller than or equal to the maximum size allowed by the hardware. The default kernel value on RHEL4/5 is 512kb.

To adjust or set the maximum transfer size, set the device's max_sectors_kb to the desired value. That value must be less than the kernel's maximum of 512kb. For example, say a storage vendor recommends limiting the I/O transfer size to their storage at or below 128kb per request. Then you would perform the following command for each such storage device:

# echo 128 > /sys/block/sdz/queue/max_sectors_kb
Caution!   The new max_sectors_kb value must be less than the device's max_hw_sectors_kb and max_dev_sectors values otherwise you can get errors as a result. See Limit Checks section below for determining the upper safe value limit for max_sectors_kb.
Caution!   Reducing the max_sectors_kb value should only be done when use of the disk is acquiesced. Otherwise there is the chance an IO passes through an upper device mapper layer, such as LVM and/or Multipath, using the larger older value only to be rejected at the device layer due to it being larger than the new smaller maximum IO size. See "How to set custom 'max_sectors_kb' option for devices under multipathd control?", specifically step 2 for more information on this aspect.

Persistent max_sectors_kb configuration

Changes to /sys entries do not persist across reboots, however, there are some options to set max_sectors_kb persistently across reboots.

Option 1: multipath configuration

This option is available with RHEL 6.8+, RHEL 7.3+ and specific multipath versions and the recommended option for devices under multipath control.

Refer to How to set custom 'max_sectors_kb' option for devices under multipathd control? for details.

Option 2: udev rule

To tune max_sectors_kb settings persistent for dynamically added storage devices, they can be incorporated in udev rules which are executed when the device is added. This is the recommended option as it applies limits at device discovery time which happens early in boot as well as covering disks dynamically added at a later time.

Before continuing, refer to What is udev and how do you write custom udev rules? for details.

For example, to set a 128kb limit for HP 3PAR storage, create a file inside /etc/udev/rules.d/ directory with the content:

  • RHEL 7,8,9:

    ACTION=="add|change", SUBSYSTEM=="block", ENV{ID_VENDOR}=="3PARdata", ENV{ID_MODEL}=="VV", ATTR{queue/max_sectors_kb}+="128"
    
  • RHEL 5/6:

    ACTION=="add|change", SUBSYSTEM=="block", ATTRS{vendor}=="3PARdata", ATTRS{model}=="VV", RUN+="/bin/sh -c '/bin/echo 128 > /sys%p/queue/max_sectors_kb'"
    

If the maximum number of kilobytes that the block layer will allow for a filesystem request can't be achieved even after max_sectors_kb is changed for the device, then check if it is formatted with any other third party filesystem. If yes, contact the vendor of the filesystem for support.

Option 3: /etc/rc.d/rc.local script

One way to set max_sectors_kb persistently is to place the tuning command(s) inside /etc/rc.d/rc.local.

This will only work for storage devices that are present during system boot. For example, to set a 128kb limit for /dev/sda, /dev/sdb, and /dev/sdc:

echo 128 > /sys/block/sda/queue/max_sectors_kb
echo 128 > /sys/block/sdb/queue/max_sectors_kb
echo 128 > /sys/block/sdc/queue/max_sectors_kb

Diagnostic Steps

Example of normal SATA internal drive:

$ cat /sys/block/sda/queue/max_sectors_kb
512
$ cat /sys/block/sda/queue/max_hw_sectors_kb
32767

Example of a usb drive, which has a lower hardware limit than the block layer default:

$ cat /sys/block/sdb/queue/max_sectors_kb
120
$ cat /sys/block/sdb/queue/max_hw_sectors_kb
120

In both cases, the max_sectors_kb has been set to the minimum of the block layer default and the hardware/driver value.

We can see examples of two different transfer sizes in action using simple dd and iostat commands.

$ echo 512 > /sys/block/sda/queue/max_sectors_kb
$ dd if=/dev/sda5 of=/dev/null iflag=direct bs=4M count=250 &

Output from iostat -tkx 1 shows the average request size (avgrq-sz) never exceeds 1024 sectors or 512kb as expected:

Device: rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda5      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5      0.00     0.00 109.00  0.00 55808.00     0.00  «1024.00»  3.39   29.36   6.46  70.40
sda5      0.00     0.00 181.19  0.00 92768.32     0.00  «1024.00»  4.50   25.42   5.42  98.22
sda5      0.00     0.00 195.00  0.00 99840.00     0.00  «1024.00»  4.56   23.57   5.13 100.00

Note: the dd command uses the iflag=direct to force 4MB I/O into the kernel block/scheduler layer where that size is reduced to the maximum transfer size allowed by the target device. Because this single large io request is broken down into smaller sized ones, we see this reflected in the average queue size being greater than 1.

$ echo 128 > /sys/block/sda/queue/max_sectors_kb
$ dd if=/dev/sda5 of=/dev/null iflag=direct bs=4M count=250 &

Output from iostat -tkx 1 now shows the average request size (avgrq-sz) never exceeds 256 sectors or 128kb as expected:

Device: rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda5      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5      0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda5      0.00     0.00 169.00  0.00 21632.00     0.00  «256.00»   5.74   31.81   1.96  33.20
sda5      0.00     0.00 539.60  0.00 69069.31     0.00  «256.00»  16.49   30.58   1.83  98.61
sda5      0.00     0.00 587.00  0.00 75136.00     0.00  «256.00»  15.67   26.39   1.70  99.60
sda5      0.00     0.00 495.00  0.00 63360.00     0.00  «256.00»  16.25   33.66   2.03 100.40

Limit Checks

The max_sectors_kb value must be less than either of the device's max_hw_sectors_kb and/or max_dev_sectors, which ever is the smaller of the two values. Any value that is above either of these limit will likely cause warnings or errors due to being above storage hardware specified limits. See the following for examples of improperly setting the max_sectors_kb value:

In addition see the following whereby bugs resulted in the value set within max_sectors_kb possibly ending up being reverted or changed.

To check a device's specified maximum io size limits:

  • To check the device's max_hw_sectors_kb and ensure that max_sectors_kb is lower:

      [root@host ~]# cat /sys/block/sdc/queue/max_hw_sectors_kb 
      32767
    
  • To check max_dev_sectors via sg_vpd command and ensure that max_sectors_kb is lower:

Note  If VPD page 0xB0 (SCSI Block Limits) is not supported by the device, then skip this check based upon the Maximum transfer length value.
[root@host ~]# sg_vpd -p 0xb0 /dev/sdc  
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 1
  Maximum compare and write length: 1 blocks
  Optimal transfer length granularity: 8 blocks
  Maximum transfer length: 2048 blocks     <======
  Optimal transfer length: 128 blocks
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 0
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x4000 blocks

Within the above case with max_hw_sectors_kb=32767 and max_dev_sectors=2048, the highest safe value for max_sectors_kb would 1024KiB as shown below:

        2048 blocks (logical sectors)  x 512 bytes per disk block = 1048576 bytes / 1024 bytes per KiB = 1024KiB
        1024 max_sectors_kb x 1024 bytes per KiB = 1048576 bytes
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.