How to set custom 'max_sectors_kb' option for devices under multipathd control?
Environment
- Red Hat Enterprise Linux (RHEL) 6.8, 7.3
kernel-2.6.32-642.el6or laterkernel-3.10.0-327.el7or laterdevice-mapper-multipath
Issue
-
How to set custom
max_sectors_kboption for multipath devices and its sub paths without causing any disruption to ongoing IO operations? -
While changing the
max_sectors_kboption for multipath devices and its sub paths with following command, the IO operations started to fail with below errors:$ echo '256' > /sys/block/<device>/queue/max_sectors_kb $ tail -f /var/log/messages kernel: blk_cloned_rq_check_limits: over max size limit. kernel: blk_cloned_rq_check_limits: over max size limit. kernel: blk_cloned_rq_check_limits: over max size limit. kernel: device-mapper: multipath: Failing path 70:61. kernel: device-mapper: multipath: Failing path 71:81. multipathd: 70:61: mark as failed multipathd: crs_01: remaining active paths: 1 multipathd: 71:81: mark as failed [...]- The same issue was not observed with kernel versions older than
2.6.32-642.el6and3.10.0-327.el7.
- The same issue was not observed with kernel versions older than
Resolution
-
It is always tricky to set
max_sectors_kblimit for SCSI devices and multipath device maps while IO operations are running on disk devices, thus engineering team has now updateddevice-mapper-multipathpackage to include a new configuration option -max_sectors_kbto set maximum IO size. This new option is available in below versions ofdevice-mapper-multipathRPMs:- RHEL 6.9, device-mapper-multipath-0.4.9-100.el6 and later
- RHEL 7.2.z device-mapper-multipath-0.4.9-85.el7_2.8 and later
- RHEL 7.3.z device-mapper-multipath-0.4.9-99.el7_3.3 and later
-
With this update, device-mapper-multipath provides a new
max_sectors_kbparameter in thedefaults,devices, andmultipathsections of the/etc/multipath.conffile. This parameter allows you to set themax_sectors_kbdevice queue parameter to the specified value on all underlying paths of a multipath device before the multipath device is first activated. -
When a multipath device is created, the device inherits the
max_sectors_kbvalue from the path devices. Manually raising this value for the multipath device or lowering this value for the path devices can cause multipath to create I/O operations larger than the path devices allow.Using the
max_sectors_kbparameter in multipath.conf is an easy way to set these values before a multipath device is created on top of the path devices, and prevents invalid-sized I/O operations from being passed down. Use the below steps to increasemax_sectors_kbvalue using the above option:
Caution! The newmax_sectors_kbvalue must be less than the device'smax_hw_sectors_kbandmax_dev_sectorsvalues otherwise you will get "multipathd: failed setting max_sectors_kb on sdc : Invalid argument" errors. See the same document's Diagnostics section for determining those value limits.
-
Change the
max_sectors_kbto required value by adding thedevicessection in/etc/multipath.conffile as shown below:devices { device { vendor "NETAPP" product "LUN.*" max_sectors_kb 4096 ### Mentioning this option in 'device' section of NETAPP ### LUNs would apply it only for NETAPP devices } } # Above `max_sectors_kb` option could also be applied for specific multipath device # by adding an entry of this option in `multipaths` section. -
Stop any applications which are using the multipath devices. Unmount any filesystems which use the multipath device. If any lvm volumes are using these multipath devices, then not only unmount any filesystems created on a logical volume which has a PV on the multipath device, but also deactivate all logical volumes using the multipath device (
vgchange -an <volume-name>). Unmounting, and deactivating logical volumes, if lvm uses the mpath device, is required before you are able to flush (remove) the multipath device map and rescan/recreate the multipath device withnew max_sectors_kbvalue. This is to avoid any request queue limit violations and IO errors that could occur due to a change inmax_sectors_kbparameter while the IO is in flight. That is to avoid conflicting IO size limit differences between the device mapper and the physical device layers (sdN for example). The new value ofmax_sectors_kbspecified in multipath.conf is only applied during multipath device map creation.
-
Then flush the multipath device map:
$ multipath -f <multipath-device-name> </br> -
Reload the
multipathdservice so that it could use a new configuration from/etc/multipath.conffile:$ service multipathd reload -
Re-scan the multipath device maps:
$ multipath -v2 $ multipath -ll -
Verify the new value of
max_sectors_kboption for multipath device and its sub paths:$ cat /sys/block/<device>/queue/max_sectors_kb </br>- Rebuild initramfs boot image.
Note: If the system is configured to boot from SAN with multipath devices, then please also rebuild the initial ram disk image with following commands so that
initramfsimage will have updated/etc/multipath.conffile withmax_sectors_kboption set:
First take a backup of current initramfs image: $ cp -av /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak Rebuild the initramfs image with multipath configuration files included: $ dracut -f -v -a multipath --include /etc/multipath /etc/multipath - Rebuild initramfs boot image.
Note: If the system is configured to boot from SAN with multipath devices, then please also rebuild the initial ram disk image with following commands so that
Root Cause
-
The error messages
blk_cloned_rq_check_limits: over max size limitare logged from the following code block inblk_cloned_rq_check_limits:block/blk-core.c 1870 static int blk_cloned_rq_check_limits(struct request_queue *q, 1871 struct request *rq) 1872 { 1873 if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) { 1874 printk(KERN_ERR "%s: over max size limit.\n", __func__); 1875 return -EIO; <---------- 1876 } [...] -
The check in above snip calls
blk_rq_sectorsto verify number of sectors in cloned request and compare it withmax_discard_sectorsormax_sectorslimits for therequest_queue. If it finds that number of sectors in cloned request are more thanmax_discard_sectorsormax_sectorslimits for therequest_queue, then we print error and return IO error (-EIO):include/linux/blkdev.h 857 static inline unsigned int blk_rq_sectors(const struct request *rq) 858 { 859 return blk_rq_bytes(rq) >> 9; 860 } 872 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, 873 unsigned int cmd_flags) 874 { 875 if (unlikely(cmd_flags & REQ_DISCARD)) 876 return min(q->limits.max_discard_sectors, UINT_MAX >> 9); <<---------- 877 878 return q->limits.max_sectors; <<---------- 879 } -
The check marked in above
blk_cloned_rq_check_limitsfunction was added in RHEL 6.8, 7.3 kernel through following commits:Upstream commit bf4e6b4e757488dee1b6a581f49c7ac34cd217f8 block: Always check queue limits for cloned requests -
Since the cloned request was violating the request_queue limits, kernel had reported
blk_cloned_rq_check_limits: over max size limiterrors. This subsequently resulted in path failure in multipath.Above patch was not present in the kernel versions older than
2.6.32-642.el6,3.10.0-327.el7, due to which the same request queue limit violations were not resulting IO failures with old kernels.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.