kernel: bio too big device md0 (1024 > 256) on RHEL 7

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 7

Issue

  • Issue appeared when a drive was replaced in a software RAID.

  • Not able to hot swap out NVMes without data corruption on databases.

  • dmesg is flooded with following error messages.

      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (824 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256)
    

Resolution

  • This issue has been address in RHSA-2018:2748. Please upgrade to kernel-3.10.0-862.14.4.el7 (or newer).

      $ rpm -qp --changelog ./kernel-3.10.0-862.14.4.el7.x86_64.rpm  | egrep  '1600056|1568070'
      - [md] avoid NULL dereference to queue pointer (Ming Lei) [1600056 1581845]
      - [md] support to split big bio (Ming Lei) [1568070 1557434]
      - [block] introduce bio_split2() and bio_pair2_release() (Ming Lei) [1568070 1557434]
    

Work-Around

  • Create a udev rule for workaround. Please refer to What is udev and how do you write custom udev rules in systemd environments (RHEL7 and later) ?

      ACTION!="add|change", GOTO="max_sectors_kb_end" 
      KERNEL!="dm-*|md*", GOTO="max_sectors_kb_end" 
      ENV{DM_VG_NAME}=="vg03", ENV{DM_LV_NAME}=="fast-tmp", ATTR{queue/max_sectors_kb}="128"
      ENV{DM_VG_NAME}=="vg03", ENV{DM_LV_NAME}=="mysql",    ATTR{queue/max_sectors_kb}="128"
      ENV{MD_UUID}=="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",   ATTR{queue/max_sectors_kb}="128"
      LABEL="max_sectors_kb_end"
    
NOTE   It is preferred to reference and change the max_sectors_kb directly within systemd environments versus using the systemv method of using RUN+="/bin/sh -c '/bin/echo 128 > /sys%p/queue/max_sectors_kb'" as this latter can end up forking off large numbers of processes during initial device discovery resulting in udev timeouts being reported. Setting the block device parameter directly is supported in systemd environments and is much more efficient.

Root Cause

Adding or removing underlying disks changed the max-hw_sectors limit of the Multiple Device (MD) queue and the change was not updated to the upper layer. As a consequence, the "bio too big" message appeared together with input/output (I/O) failure. This update introduces the bio_split2() and bio_pair2_release() functions with splitting mechanism. As a result, the "bio too big" message and I/O failure no longer appear.

Diagnostic Steps

  • dmesg is flooded with following error messages

      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (824 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (824 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256) 
      [Fri Mar 9 12:34:51 2018] bio too big device md0 (1024 > 256)
    
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.