Buffer I/O error during writes to the Direct LUN on a HP 3PAR Storage due to incorrect large max_sectors_kb on virtual machine

Solution Verified - Updated 14 Jun 2024

Environment

Red Hat Enterprise Virtualization
- Red Hat Enterprise Linux 7.4 virtual machine
- Kernel 3.10.0-693.11.6.el7.x86_64
- kernel 3.10.0-514.el7
- kernel-3.10.0-427.el7
- kernel-3.10.0-414.el7
HPE 3PAR SAN

Issue

After updating from kernel 3.10.0-327.22.2.el7.x86_64 (RHEL7.2) to 3.10.0-693.11.6.el7.x86_64 (RHEL7.2), database backup fails intermittently, write operation completes but the log file '/var/log/messages' fills with following error repeatedly when write goes to Directly mounted LUN

[  254.002322] sd 2:0:0:3: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  254.002336] sd 2:0:0:3: [sdc] Sense Key : Hardware Error [current] 
[  254.002341] sd 2:0:0:3: [sdc] Add. Sense: Internal target failure
[  254.002346] sd 2:0:0:3: [sdc] CDB: Write(10) 2a 00 00 68 51 20 00 28 00 00
[  254.002349] blk_update_request: critical target error, dev sdc, sector 6836512
. . . . .
[  254.120487] Buffer I/O error on dev dm-1, logical block 939051, lost async page write
[  254.120510] Buffer I/O error on dev dm-1, logical block 939052, lost async page write

Resolution

To resolve, manually set max_sector_kb=512 to the underlying sd devices.

# echo 512 > /sys/block/sdX/queue/max_sectors_kb

A udev role can be created to make it persistent on boot. For example, edit the /etc/rc.d/rc.local as

ACTION=="add|change", SUBSYSTEM=="block", ENV{ID_VENDOR}=="XYZ", ENV{ID_MODEL}=="SAN_Model_xyz*", RUN+="/bin/sh -c '/bin/echo 512 > /sys%p/queue/max_sectors_kb'"

The value of ENV{ID_VENDOR} & ID_MODEL can be retrieved by running following command:

# udevadm info --query=all --path=/class/block/sdX

Root Cause

HP's 3PAR storage returns incorrect transfer length limit values for its disks.

There were enhancements added to the scsi layer in recent release to fetch the IO transfer length limits as exposed by the device. This information was retrived while sending the SCSI INQUIRY command to a device and then manipulating the same in sysfs directory. It is the patch 5d36ce1 in following list which added above changes.

Due to this, 3PARdata SAN devices advertised max_sectors_kb values for in 693.11.6 kernel:

ced406f [scsi/block] sd: Fix device-imposed transfer length limits
d98ea39 [scsi] scsi_sysfs: Fix queue_ramp_up_period return code	
5d36ce1 [scsi] scsi: Export SCSI Inquiry data to sysfs		<<---
c865292 [scsi] sd: Fix maximum I/O size for BLOCK_PC requests

Noticed that 3PARdata SAN device is reporting much higher optimal transfer length, due to which the max_sectors_kb value for it is set to 16384, but when the actual large IO operations are done on the device, then storage array is returning the SCSI error with sense data Hardware Error / Internal target failure on the large READ(10) and WRITE(10) CDB operations.

Diagnostic Steps

On the vm, Optimal transfer length is much less than Maximum transfer length

[root@affected_vm ~]# sg_vpd -p bl /dev/sdX
Block limits VPD page (SBC):
  Write same no zero (WSNZ): 0
  Maximum compare and write length: 1 blocks
  Optimal transfer length granularity: 32 blocks
  Maximum transfer length: 4194303 blocks   <====== Maximum Transfer Length != optimal transfer lenghth (Not Equal)
  Optimal transfer length: 32768 blocks    <=========
  Maximum prefetch length: 0 blocks
  Maximum unmap LBA count: 65536
  Maximum unmap block descriptor count: 10
  Optimal unmap granularity: 32
  Unmap granularity alignment valid: 1
  Unmap granularity alignment: 0
  Maximum write same length: 0x8000 blocks

max_sectors_kb is also high 16384

# cat /sys/block/sdX/device/max_sectors_kb
16384

But the hypervisor is fine, each underlying sd devices of multipath'ed LUN and found that maximum and optimal transfer lengths are same. Hence no issue reported from the host/server.

[root@affected_Hypervisor ~]# sg_vpd -p bl /dev/sdX  
	Block limits VPD page (SBC):
	  Write same no zero (WSNZ): 0
	  Maximum compare and write length: 1 blocks
	  Optimal transfer length granularity: 32 blocks
	  Maximum transfer length: 32768 blocks     <--------- Maximum Transfer Length == optimal transfer lenghth
	  Optimal transfer length: 32768 blocks     <---------- ^^
	  Maximum prefetch length: 0 blocks
	  Maximum unmap LBA count: 65536
	  Maximum unmap block descriptor count: 10
	  Optimal unmap granularity: 32
	  Unmap granularity alignment valid: 1
	  Unmap granularity alignment: 0

In older kernel, the default max_sectors_kb value of 512 for same device:

	     Device: sdX
			max_hw_sectors_kb: 32767
			max_sectors_kb: 512
			max_segments: 126
			max_segment_size: 65536
			minimum_io_size: 16384
			optimal_io_size: 16777216
			physical_block_size: 512

Thus in this situation, same behavior can be achieved as that of old kernel, by reducing the max_sectors_kb value for a device back to 512.

SBR

Storage

Product(s)

Red Hat Enterprise Linux

Components

virt-manager

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.