How to set custom 'max_sectors_kb' option for devices under multipathd control?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux (RHEL) 6.8, 7.3
    • kernel-2.6.32-642.el6 or later
    • kernel-3.10.0-327.el7 or later
    • device-mapper-multipath

Issue

  • How to set custom max_sectors_kb option for multipath devices and its sub paths without causing any disruption to ongoing IO operations?

  • While changing the max_sectors_kb option for multipath devices and its sub paths with following command, the IO operations started to fail with below errors:

      $ echo '256' > /sys/block/<device>/queue/max_sectors_kb 
    
      $ tail -f /var/log/messages
      kernel: blk_cloned_rq_check_limits: over max size limit.
      kernel: blk_cloned_rq_check_limits: over max size limit.
      kernel: blk_cloned_rq_check_limits: over max size limit.
      kernel: device-mapper: multipath: Failing path 70:61.
      kernel: device-mapper: multipath: Failing path 71:81.
      multipathd: 70:61: mark as failed
      multipathd: crs_01: remaining active paths: 1
      multipathd: 71:81: mark as failed
      [...]
    
    • The same issue was not observed with kernel versions older than 2.6.32-642.el6 and 3.10.0-327.el7.

Resolution

  • It is always tricky to set max_sectors_kb limit for SCSI devices and multipath device maps while IO operations are running on disk devices, thus engineering team has now updated device-mapper-multipath package to include a new configuration option - max_sectors_kb to set maximum IO size. This new option is available in below versions of device-mapper-multipath RPMs:

  • With this update, device-mapper-multipath provides a new max_sectors_kb parameter in the defaults, devices, and multipath sections of the /etc/multipath.conf file. This parameter allows you to set the max_sectors_kb device queue parameter to the specified value on all underlying paths of a multipath device before the multipath device is first activated.

  • When a multipath device is created, the device inherits the max_sectors_kb value from the path devices. Manually raising this value for the multipath device or lowering this value for the path devices can cause multipath to create I/O operations larger than the path devices allow.

    Using the max_sectors_kb parameter in multipath.conf is an easy way to set these values before a multipath device is created on top of the path devices, and prevents invalid-sized I/O operations from being passed down. Use the below steps to increase max_sectors_kb value using the above option:

Caution!   The new max_sectors_kb value must be less than the device's max_hw_sectors_kb and max_dev_sectors values otherwise you will get "multipathd: failed setting max_sectors_kb on sdc : Invalid argument" errors. See the same document's Diagnostics section for determining those value limits.
  1. Change the max_sectors_kb to required value by adding the devices section in /etc/multipath.conf file as shown below:

             devices {
                 device {
                     vendor "NETAPP"
                     product "LUN.*"
                     max_sectors_kb 4096 ### Mentioning this option in 'device' section of NETAPP 
                                         ### LUNs would apply it only for NETAPP devices
                 }
             }
         
             # Above `max_sectors_kb` option could also be applied for specific multipath device 
             # by adding an entry of this option in `multipaths` section.
    

  2. Stop any applications which are using the multipath devices. Unmount any filesystems which use the multipath device. If any lvm volumes are using these multipath devices, then not only unmount any filesystems created on a logical volume which has a PV on the multipath device, but also deactivate all logical volumes using the multipath device (vgchange -an <volume-name>). Unmounting, and deactivating logical volumes, if lvm uses the mpath device, is required before you are able to flush (remove) the multipath device map and rescan/recreate the multipath device with new max_sectors_kb value. This is to avoid any request queue limit violations and IO errors that could occur due to a change in max_sectors_kb parameter while the IO is in flight. That is to avoid conflicting IO size limit differences between the device mapper and the physical device layers (sdN for example). The new value of max_sectors_kb specified in multipath.conf is only applied during multipath device map creation.

  3. Then flush the multipath device map:

             $ multipath -f <multipath-device-name>
    
    </br>
    
  4. Reload the multipathd service so that it could use a new configuration from /etc/multipath.conf file:

             $ service multipathd reload
    

  5. Re-scan the multipath device maps:

             $ multipath -v2
             $ multipath -ll
    

  6. Verify the new value of max_sectors_kb option for multipath device and its sub paths:

             $ cat /sys/block/<device>/queue/max_sectors_kb 
    
       </br>
    
    1. Rebuild initramfs boot image. Note: If the system is configured to boot from SAN with multipath devices, then please also rebuild the initial ram disk image with following commands so that initramfs image will have updated /etc/multipath.conf file with max_sectors_kb option set:
    First take a backup of current initramfs image:
    
             $ cp  -av  /boot/initramfs-$(uname -r).img  /boot/initramfs-$(uname -r).img.bak
    
    Rebuild the initramfs image with multipath configuration files included:
    
             $ dracut  -f  -v  -a  multipath  --include  /etc/multipath  /etc/multipath
    

Root Cause

  • The error messages blk_cloned_rq_check_limits: over max size limit are logged from the following code block in blk_cloned_rq_check_limits:

      block/blk-core.c
      1870 static int blk_cloned_rq_check_limits(struct request_queue *q,
      1871                                       struct request *rq)
      1872 {
      1873         if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
      1874                 printk(KERN_ERR "%s: over max size limit.\n", __func__);
      1875                 return -EIO;		<----------
      1876         }
      [...]
    
  • The check in above snip calls blk_rq_sectors to verify number of sectors in cloned request and compare it with max_discard_sectors or max_sectors limits for the request_queue. If it finds that number of sectors in cloned request are more than max_discard_sectors or max_sectors limits for the request_queue, then we print error and return IO error (-EIO):

      include/linux/blkdev.h
       857 static inline unsigned int blk_rq_sectors(const struct request *rq)
       858 {
       859         return blk_rq_bytes(rq) >> 9;
       860 }
      
      
       872 static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
       873                                                      unsigned int cmd_flags)
       874 {
       875         if (unlikely(cmd_flags & REQ_DISCARD))
       876                 return min(q->limits.max_discard_sectors, UINT_MAX >> 9);	<<----------
       877 
       878         return q->limits.max_sectors;	<<----------
       879 }
    
  • The check marked in above blk_cloned_rq_check_limits function was added in RHEL 6.8, 7.3 kernel through following commits:

          Upstream commit bf4e6b4e757488dee1b6a581f49c7ac34cd217f8
      	block: Always check queue limits for cloned requests
    
  • Since the cloned request was violating the request_queue limits, kernel had reported blk_cloned_rq_check_limits: over max size limit errors. This subsequently resulted in path failure in multipath.

    Above patch was not present in the kernel versions older than 2.6.32-642.el6, 3.10.0-327.el7, due to which the same request queue limit violations were not resulting IO failures with old kernels.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.