CDB: Write same fails with "Sense Key : Illegal Request" when executing fstrim operations

Solution Verified - Updated 14 Jun 2024

Environment

Array vendors that have LUNS provisioned supporting the thin provisioning functionality.

Exceptions:

Does not apply to unmap issues

This is specific to write-same commands only. Failures using the unmap command are a separate issue. See the Root Cause for details as to why.

Issue

When executing fstrim, write same commands fail with "Illegal Request"

  [  225.493819] sd 9:0:0:101: [sdj] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
  [  225.493830] sd 9:0:0:101: [sdj] tag#0 Sense Key : Illegal Request [current] 
  [  225.493836] sd 9:0:0:101: [sdj] tag#0 Add. Sense: Invalid field in cdb
  [  225.493841] sd 9:0:0:101: [sdj] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 04 08 00 00 3f ff ff 00 00
  [  225.493846] blk_update_request: critical target error, dev sdj, sector 264192
  [  225.494047] blk_update_request: critical target error, dev dm-6, sector 264192
  [  225.721083] blk_update_request: critical target error, dev dm-6, sector 264195

This above errors lead to devices going offline within multipath.

Resolution

Check with storage vendor to determine if there is a firmware update to avoid this issue.
- The firmware is returning back a maximum write same size of '0', which means no reported limit. But when kernel uses a large length, storage fails the request due to the length being too long.
Workaround for the storage issue; add the following as a kernel command line option within the kernel boot line:
```
    scsi_mod.dev_flags=<vendor>:<model>:0x80000000  [1]

    for example, with 3PAR storage:
        scsi_mod.dev_flags=3PARdata:VV:0x80000000
```
- This boot time option, when added on the kernel command line, changes the kernel's behavior to ignore the returned maximum length for write same and instead use the maximum length returned for unmap commands for write same. This allows the kernel to work with certain ARRAY vendor devices that do not properly return the correct maximum supported WRITE SAME length values. When a storage device matches the specified vendor and model strings, then the kernel modifies the WRITE SAME commands that otherwise the array might reject.

NOTES:
[1] Replace <vendor>: <model> with the required storage vendor name and model information. For example, if /proc/scsi/scsi has the following information for the device to be modified:

Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: LIO-ORG  Model: thin2 Rev: 01.0
  Type:   Direct-Access                    ANSI  SCSI revision: 05

The the kernel command line option would be:


scsi_mod.dev_flags=LIO-ORG:thin2:0x80000000

Root Cause

We have found that some storage arrays at some firmware revisions return errors on WRITE SAME commands issued to perform discard operations as a result of the returned maximum write same size of 0. A new blacklist flag allows the kernel to avoid issuing commands that the array will incorrectly reject.

Certain array models do not correctly report the maximum supported size of the WRITE SAME commands in the correct field of the required VPD (Vital Product Data) page. More specifically they report a maximum write same size of 0 -- which per the scsi standard means no maximum write size reported. Since a correct limit size is returned for UNMAP commands, this issue only affects WRITE SAME.

When storage returns back a WRITE SAME maximum length of zero (no reported limit), the kernel uses a size of 4194303, 0x003F.FFFF, as seen in the following WRITE SAME(16) command and storage rejects/fails the io as being too long.

[  225.493841] sd 9:0:0:101: [sdj] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 04 08 00 00 3f ff ff 00 00
[  225.493846] blk_update_request: critical target error, dev sdj, sector 264192

The underlined 0x0004.0800 is the sector (LBA) number of 264192.

The maximum allowed write same size is specified by storage within the returned data for the SCSI Block Limits VPD page, for example:

# sg_vpd -e | grep -i "block limit"
  bl         0xb0      Block limits (SBC)
  ble        0xb7      Block limits extension (SBC)
# sg_vpd -p bl /dev/sda
:
  Maximum unmap LBA count: 256
  Maximum write same length: 0 
:

Per the SCSI Specification:

A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates the maximum number of contiguous logical blocks that the device server allows to be unmapped or written in a single WRITE SAME command. A MAXIMUM WRITE SAME LENGTH field set to zero indicates that the device server does not report a limit on the number of logical blocks that it allows to be unmapped or written in a single WRITE SAME command.

There is a similar, but separate, limit specification within the same VPD page for UNMAP commands. The correct limit for UNMAP commands is returned in these cases -- which means there is no problem with UNMAP commands, only WRITE SAME commands.

So in this case storage does not report back a limit for write same commands -- returning a 0 value indicating no limit reported. The kernel uses a size of (0x3F.FFFF) blocks as the default in this case. However, it turns out this storage does have a limit -- in an vendor specific implementation it assumes the the smaller of a non-zero value between the maximum unmap lba count and the maximum write same length will be used for both unmap SCSI commands as well as write same SCSI commands. In the above example, 256 blocks. So when the kernel issues a write same of 0x3F.FFFF blocks in length -- or any length in excess of the 256 block unmap limit in the above example -- storage returns back an illegal request due to a write same size that exceeds the internal limits of storage. Correcting the storage firmware to conform to the SCSI standard specification and return the correct block length limit for write same is the correct solution in this case.

However until storage corrects its firmware, the available workaround for the issue is add the aforementioned boot time clause which sets a blacklist option that allows the kernel to accommodate devices with this behavior.

The full definition of all the control flags for the kernel boot time option are defined within the ./scsi/scsi_devininfo.h header file:


#ifndef _SCSI_SCSI_DEVINFO_H
#define _SCSI_SCSI_DEVINFO_H
/*
 * Flags for SCSI devices that need special treatment
 */
#define BLIST_NOLUN                       0x001 /* Only scan LUN 0                          */
#define BLIST_FORCELUN                    0x002 /* Known to have LUNs, force scanning,
                                                   deprecated: Use max_luns=N               */
#define BLIST_BORKEN                      0x004 /* Flag for broken handshaking              */
#define BLIST_KEY                         0x008 /* unlock by special command                */
#define BLIST_SINGLELUN                   0x010 /* Do not use LUNs in parallel              */
#define BLIST_NOTQ                        0x020 /* Buggy Tagged Command Queuing             */
#define BLIST_SPARSELUN                   0x040 /* Non consecutive LUN numbering            */
#define BLIST_MAX5LUN                     0x080 /* Avoid LUNS >= 5                          */
#define BLIST_ISROM                       0x100 /* Treat as (removable) CD-ROM              */
#define BLIST_LARGELUN                    0x200 /* LUNs past 7 on a SCSI-2 device           */
#define BLIST_INQUIRY_36                  0x400 /* override additional length field         */
#define BLIST_INQUIRY_58                  0x800 /* ... for broken inquiry responses         */
#define BLIST_NOSTARTONADD               0x1000 /* do not do automatic start on add         */
#define BLIST_MS_SKIP_PAGE_08            0x2000 /* do not send ms page 0x08                 */
#define BLIST_MS_SKIP_PAGE_3F            0x4000 /* do not send ms page 0x3f                 */
#define BLIST_USE_10_BYTE_MS             0x8000 /* use 10 byte ms before 6 byte ms          */
#define BLIST_MS_192_BYTES_FOR_3F       0x10000 /*  192 byte ms page 0x3f request           */
#define BLIST_REPORTLUN2                0x20000 /* try REPORT_LUNS even for SCSI-2 devs
                                                   (if HBA supports more than 8 LUNs)       */
#define BLIST_NOREPORTLUN               0x40000 /* don't try REPORT_LUNS scan (SCSI-3 devs) */
#define BLIST_NOT_LOCKABLE              0x80000 /* don't use PREVENT-ALLOW commands         */
#define BLIST_NO_ULD_ATTACH            0x100000 /* device is actually for RAID config       */
#define BLIST_SELECT_NO_ATN            0x200000 /* select without ATN                       */
#define BLIST_RETRY_HWERROR            0x400000 /* retry HARDWARE_ERROR                     */
#define BLIST_MAX_512                  0x800000 /* maximum 512 sector cdb length            */
#define BLIST_ATTACH_PQ3              0x1000000 /* Scan: Attach to PQ3 devices              */
#define BLIST_NO_DIF                  0x2000000 /* Disable T10 PI (DIF)                     */
#define BLIST_SKIP_VPD_PAGES          0x4000000 /* Ignore SBC-3 VPD pages                   */
#define BLIST_TRY_VPD_PAGES          0x10000000 /* Attempt to read VPD pages                */
#define BLIST_NO_RSOC                0x20000000 /* don't try to issue RSOC                  */
#define BLIST_MAX_1024               0x40000000 /* maximum 1024 sector cdb length           */
#define BLIST_UNMAP_LIMIT_WS         0x80000000 /* Use UNMAP limit for WRITE SAME           */

#endif

Each of the above flags is present to handle different odd behaviour with certain storage devices. When the <vendor> and <model> names of a storage device match the dev_flags fields for the same, then the defined <flags> are applied to that storage device(s).

An upstream patch has been back-ported to RHEL7.5 which adds the highlighted BLIST_UNMAP_LIMIT_WS option to help deal with storage devices that exhibit this particular behaviour. The value of 0x80000000 sets the BLIST_UNMAP_LIMIT_WS control flag which causes the UNMAP limit to be used for discards using WRITE SAME / WRITE SAME(16) instead of the WRITE SAME limit.

Note: There are workarounds for this by forcefully changing sysfs entries however these changes will not be preserved across boots.
The following script was used but PLEASE keep in mind this was very specific to a particular output of multipath -ll.

 #!/bin/bash
 #
  multipath -ll | grep "active ready running" | awk '{printf("%s\n ",$2)}' | grep -
  v "\`" > /tmp/tupples
  multipath -ll | grep mpath | awk '{printf("%s\n",$3)}' > /tmp/dm-devices

  echo "before change"
  for i in `cat /tmp/dm-devices`
 do
  echo $i
  cat /sys/block/$i/queue/discard_max_bytes
 done

 for t in `cat /tmp/tupples`
 do
  echo "Setting to 65535 for disk $t"
  echo 65535 > /sys/class/scsi_disk/$t//max_write_same_blocks
  echo -n "writesame_16" >/sys/class/scsi_disk/$t/provisioning_mode 
  cat /sys/class/scsi_disk/$t//max_write_same_blocks
 done
 echo "reload multipath maps"
 multipath -r

 echo "after change"
 for i in `cat /tmp/dm-devices`
 do
  echo $i
  cat /sys/block/$i/queue/discard_max_bytes
 done

Diagnostic Steps

The maximum allowed write same size is specified by storage within the returned data for the SCSI Block Limits VPD page. Check the write same maximum length limit returned by storage using the following command:

# sg_vpd -e | grep -i "block limit"
  bl         0xb0      Block limits (SBC)
  ble        0xb7      Block limits extension (SBC)
# sg_vpd -p bl /dev/sda
:
  Maximum unmap LBA count: 256
  Maximum write same length: 0 
:

SBR

Storage

Category

Configure

Tags

storage

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.