Buffer I/O error during writes to the Direct LUN on a HP 3PAR Storage due to incorrect large max_sectors_kb on virtual machine
Environment
- Red Hat Enterprise Virtualization
- Red Hat Enterprise Linux 7.4 virtual machine
- Kernel 3.10.0-693.11.6.el7.x86_64
- kernel 3.10.0-514.el7
- kernel-3.10.0-427.el7
- kernel-3.10.0-414.el7
- HPE 3PAR SAN
Issue
After updating from kernel 3.10.0-327.22.2.el7.x86_64 (RHEL7.2) to 3.10.0-693.11.6.el7.x86_64 (RHEL7.2), database backup fails intermittently, write operation completes but the log file '/var/log/messages' fills with following error repeatedly when write goes to Directly mounted LUN
[ 254.002322] sd 2:0:0:3: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 254.002336] sd 2:0:0:3: [sdc] Sense Key : Hardware Error [current]
[ 254.002341] sd 2:0:0:3: [sdc] Add. Sense: Internal target failure
[ 254.002346] sd 2:0:0:3: [sdc] CDB: Write(10) 2a 00 00 68 51 20 00 28 00 00
[ 254.002349] blk_update_request: critical target error, dev sdc, sector 6836512
. . . . .
[ 254.120487] Buffer I/O error on dev dm-1, logical block 939051, lost async page write
[ 254.120510] Buffer I/O error on dev dm-1, logical block 939052, lost async page write
Resolution
To resolve, manually set max_sector_kb=512 to the underlying sd devices.
# echo 512 > /sys/block/sdX/queue/max_sectors_kb
A udev role can be created to make it persistent on boot. For example, edit the /etc/rc.d/rc.local as
ACTION=="add|change", SUBSYSTEM=="block", ENV{ID_VENDOR}=="XYZ", ENV{ID_MODEL}=="SAN_Model_xyz*", RUN+="/bin/sh -c '/bin/echo 512 > /sys%p/queue/max_sectors_kb'"
The value of ENV{ID_VENDOR} & ID_MODEL can be retrieved by running following command:
# udevadm info --query=all --path=/class/block/sdX
Root Cause
HP's 3PAR storage returns incorrect transfer length limit values for its disks.
There were enhancements added to the scsi layer in recent release to fetch the IO transfer length limits as exposed by the device. This information was retrived while sending the SCSI INQUIRY command to a device and then manipulating the same in sysfs directory. It is the patch 5d36ce1 in following list which added above changes.
Due to this, 3PARdata SAN devices advertised max_sectors_kb values for in 693.11.6 kernel:
ced406f [scsi/block] sd: Fix device-imposed transfer length limits
d98ea39 [scsi] scsi_sysfs: Fix queue_ramp_up_period return code
5d36ce1 [scsi] scsi: Export SCSI Inquiry data to sysfs <<---
c865292 [scsi] sd: Fix maximum I/O size for BLOCK_PC requests
Noticed that 3PARdata SAN device is reporting much higher optimal transfer length, due to which the max_sectors_kb value for it is set to 16384, but when the actual large IO operations are done on the device, then storage array is returning the SCSI error with sense data Hardware Error / Internal target failure on the large READ(10) and WRITE(10) CDB operations.
Diagnostic Steps
On the vm, Optimal transfer length is much less than Maximum transfer length
[root@affected_vm ~]# sg_vpd -p bl /dev/sdX
Block limits VPD page (SBC):
Write same no zero (WSNZ): 0
Maximum compare and write length: 1 blocks
Optimal transfer length granularity: 32 blocks
Maximum transfer length: 4194303 blocks <====== Maximum Transfer Length != optimal transfer lenghth (Not Equal)
Optimal transfer length: 32768 blocks <=========
Maximum prefetch length: 0 blocks
Maximum unmap LBA count: 65536
Maximum unmap block descriptor count: 10
Optimal unmap granularity: 32
Unmap granularity alignment valid: 1
Unmap granularity alignment: 0
Maximum write same length: 0x8000 blocks
max_sectors_kb is also high 16384
# cat /sys/block/sdX/device/max_sectors_kb
16384
But the hypervisor is fine, each underlying sd devices of multipath'ed LUN and found that maximum and optimal transfer lengths are same. Hence no issue reported from the host/server.
[root@affected_Hypervisor ~]# sg_vpd -p bl /dev/sdX
Block limits VPD page (SBC):
Write same no zero (WSNZ): 0
Maximum compare and write length: 1 blocks
Optimal transfer length granularity: 32 blocks
Maximum transfer length: 32768 blocks <--------- Maximum Transfer Length == optimal transfer lenghth
Optimal transfer length: 32768 blocks <---------- ^^
Maximum prefetch length: 0 blocks
Maximum unmap LBA count: 65536
Maximum unmap block descriptor count: 10
Optimal unmap granularity: 32
Unmap granularity alignment valid: 1
Unmap granularity alignment: 0
In older kernel, the default max_sectors_kb value of 512 for same device:
Device: sdX
max_hw_sectors_kb: 32767
max_sectors_kb: 512
max_segments: 126
max_segment_size: 65536
minimum_io_size: 16384
optimal_io_size: 16777216
physical_block_size: 512
Thus in this situation, same behavior can be achieved as that of old kernel, by reducing the max_sectors_kb value for a device back to 512.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.