SCSI sense key: Medium Error and error: return code = 0x08070002, 0x08100002, or 0x08000002

Solution Verified - Updated 7 Aug 2024

Environment

Red Hat Enterprise Linux (RHEL) 9
Red Hat Enterprise Linux (RHEL) 8
Red Hat Enterprise Linux (RHEL) 7
Red Hat Enterprise Linux (RHEL) 6
Red Hat Enterprise Linux (RHEL) 5
Red Hat Enterprise Linux (RHEL) 4

Issue

Seeing Medium Error sense key being logged.
- SCSI error: driverbyte=DRIVER_SENSE plus Medium Error sense key
- SCSI error: return code 0x08070002 plus Medium Error sense key
- SCSI error: return code 0x08000002 plus Medium Error sense key
- SCSI error: return code 0x08100002 plus Medium Error sense key
Resulting in issues with disks:
- With multipath, there could be constant path failovers.
- With a mounted filesystem, it is remounted read-only.
- It could happen for a lun or subset of luns.
/var/log/messages contains one of the following forms of the event or similar


 sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
 sd 0:0:0:0: [sda] Sense Key : Medium Error [current]
 sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error
or
kernel: sd 6:0:2:0: SCSI error: return code = 0x08100002
kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK

kernel: sde: Current: sense key: Medium Error
kernel:     Add. Sense: Record not found
or
kernel: sd 1:0:5:14: [sdhl] Add. Sense: Unrecovered read error - recommend reassignment

kernel: sd 1:0:5:14: [sdhl] CDB: Read(10): 28 00 04 45 a6 70 00 00 08 00

kernel: sd 3:0:5:14: [sduj] Unhandled sense code

kernel: sd 3:0:5:14: [sduj] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

kernel: sd 3:0:5:14: [sduj] Sense Key : Medium Error [current] 
or

kernel blk_update_request: critical medium error, dev sde, sector 1025000000

kernel sd 4:2:4:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

kernel sd 4:2:4:0: [sde] tag#0 Sense Key : Medium Error  [current]

kernel sd 4:2:4:0: [sde] tag#0 Add. Sense: Unrecovered read error

kernel sd 4:2:4:0: [sde] tag#0 CDB: Read(10) 28 00 3d 22 60 00 00 02 00 00

kernel blk_update_request: critical medium error, dev sde, sector 1025662976
or
kernel: sd 1:0:0:2: SCSI error: return code = 0x08070002 
kernel: sdab: Current: sense key: Medium Error  

kernel:     Add. Sense: Unrecovered read error

kernel: end_request: I/O error, dev sdab, sector 83235271

kernel: device-mapper: multipath: Failing path 65:176.

multipathd: dm-8: add map (uevent)

multipathd: dm-8: devmap already registered

Resolution

Engage storage hardware support.
- An on-disk hardware based media error is being reported from the scsi device and needs to be addressed at the hardware level.
- Typically the device(s) will need to be physically replaced in this case.

Note Hardware disk status from system utilities or virtual consoles (iLO, iDRAC, etc.) often will list the disk status as "ok". This does not change the fact that the disk has one or more spots on the disk that can no longer be used. The "ok" status in this instance just means that the disk was able to performed some small sub-set of commands like inquiry, tur (test unit ready), and possibly a small read or two from front of the disk. The bad media elsewhere on disk still exists, and typically replacement of the disk. Contact the hardware vendor for further assistance, especially if the disk is a raid volume from a local backplane controller like HPE's Smart Array (hpsa or smartpqi driver), an LSI controller (megaraid_sas driver) or similar.
Before replacing the disk, you should be able to recover and backup most of the files from the disk when media errors first start unless a critical filesystem metadata cannot be read. Typically media errors happen more on reads rather than writes. If a write fails due to media issues the disk will often automatically recover by revectoring the logical block addresses in and around where the write media error occurs to a section of spare disk area and move all the data to that new location, then perform the write there. However, if any read fails then the disk is unable to recovery the data currently in place for a revector operation.

Root Cause

The target device is returning a media error to the host and the host is logging the error.

In, for example, a SCSI READ fails with a media error due to a storage hardware error, the host might get back the following from storage:

SCSI status: 2h CHECK CONDITION                << I/O failed, see sense for explanation
Sense Key  : 3h MEDIUM ERROR                   << what failure was
ASC/ASCQ   : 11/00 UNRECOVERABLE MEDIUM ERROR  << failure reason/explanation

Since the storage hardware is unable to provide the requested read data to the host, the error is escalated to a critical medium error.

SCSI error return decode indicates a scsi status of 02h - Check Condition, plus indications a sense buffer is available from storage:

0x08.00.00.02 02 status byte : SAM_STAT_CHECK_CONDITION - check returned sense data, esp. ASC/ASCQ 00 msg byte : <{likely} not valid, see other fields> 00 host byte : <{likely} not valid, see other fields> 08 driver byte : DRIVER_SENSE {scsi sense buffer available from target} 0x08.07.00.02 02 status byte : SAM_STAT_CHECK_CONDITION - check sense data returned from device, esp. ASC/ASCQ 00 msg byte : <{likely} not valid, see other fields> 07 host byte : DID_ERROR - internal driver detected error 08 driver byte : DRIVER_SENSE {scsi sense buffer available from target} 0x08.10.00.02 02 status byte : SAM_STAT_CHECK_CONDITION - check returned sense data, esp. ASC/ASCQ 00 msg byte : <{likely} not valid, see other fields> 10 host byte : {RHEL5/6} DID_TARGET_FAILURE - permanent target failure, do not retry other paths {set via sense info review} 08 driver byte : DRIVER_SENSE {scsi sense buffer available from target}

SCSI status byte, per the SCSI specification:


SCSI Status:
          02h        CHECK CONDITION Indicates a contingent allegiance condition has occurred {in other words, 
                                     see the returned sense buffer from the target device}.

DRIVER_SENSE: The sense key within the returned sense buffer in all cases is Medium Error (3h):

kernel: sdab: Current: sense key: Medium Error  
sd 0:0:0:0: [sda] Sense Key : Medium Error [current]
kernel: sde: Current: sense key: Medium Error

From the SCSI specifiation:

Sense Key:
           3h        MEDIUM ERROR    Indicates that the command terminated with a non-recovered error condition 
                                     that was probably caused by a flaw in the medium or an error in the recorded
                                     data. This sense key may also be returned if the target is unable to  
                                     distinquish between a flaw in the medium and a specific hardware failure (sense 
                                     key=4)

The primary element in above is the sense key of Medium Error returned by storage.

Besides storage providing a scsi status (check condition) and sense key (medium error), the sense buffer returned from storage provides an asc/ascq code pair:

Sense Buffer Layout (Fixed Format types 70h and 71h)

        bit→
↓byte       7        6         5         4         3         2         1         0
       +--------+---------+---------+---------+---------+---------+---------+---------+
  0    |Valid   |                  Response Code<7>                                   |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  1    |                         Segment number<8> (obsolete)                         |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  2    |Filemark|   EOM   |   ILI   | Reserved|           Sense Key<4>                |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  3    |                           Information                                        |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  4    |                           Information                                        |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  5    |                           Information                                        |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  6    |                           Information                                        |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  7    |                    Additional sense length (n-7)                             |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  8    |                   Command-specific information                               |
       +--------+---------+---------+---------+---------+---------+---------+---------+
  9    |                   Command-specific information                               |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 10    |                   Command-specific information                               |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 11    |                   Command-specific information                               |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 12    |                   Additional sense code (asc)                                |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 13    |                   Additional sense code qualifier (ascq)                     |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 14    |                   FRU - Field replaceable unit code                          |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 15    |  SKSV  |          Sense-key specific                                         | 
       +--------+---------+---------+---------+---------+---------+---------+---------+
 16    |                   Sense-key specific                                         |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 17    |                   Sense-key specific                                         |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 18    |                   Additional sense bytes (variable number of bytes)          |
       +--------+---------+---------+---------+---------+---------+---------+---------+
 19    |                   Additional sense bytes (variable number of bytes)          |
       +--------+---------+---------+---------+---------+---------+---------+---------+
       :                                                                              :
       .                                                                              .
       +--------+---------+---------+---------+---------+---------+---------+---------+
 255   |                   Additional sense bytes (variable number of bytes)          |
       +--------+---------+---------+---------+---------+---------+---------+---------+

The asc/ascq code pair just provides additional information about the sense key, in this case Medium Error. The asc/ascq code pair is output by the kernel as a text string within the "Add. Sense:" line.

For example, "Unrecovered read error" is 11/00 and "Record not found" is 14/01. Other common Add. Sense strings include:


03/02  EXCESSIVE WRITE ERRORS
0C/00  WRITE ERROR
0C/01  WRITE ERROR - RECOVERED WITH AUTO REALLOCATION
0C/02  WRITE ERROR - AUTO REALLOCATION FAILED
0C/03  WRITE ERROR - RECOMMEND REASSIGNMENT
11/00  UNRECOVERED READ ERROR
11/04  UNRECOVERED READ ERROR - AUTO REALLOCATE FAILED
11/0B  UNRECOVERED READ ERROR - RECOMMEND REASSIGNMENT
11/0C  UNRECOVERED READ ERROR - RECOMMEND REWRITE THE DATA
14/01  RECORD NOT FOUND
...and other similar codes.

See Content from www.t10.org is not included.SCSI ASC/ASCQ Assignments 🔗 on the T10 SCSI standards web site for the full set of asc/ascq pairs.

The specific ASC/ASCQ may help further explain the circumstances surrounding the Medium Error, but doesn't change the main issue of a medium error being encountered and returned by storage to the host.

Diagnostic Steps

Its slightly odd thing to have the DID_ERROR within the return status of 0x08070002. The DID_ERROR is set internal to the driver upon the driver detecting an anomaly within the target provided status information.

For example, getting a scsi SUCCESS status, not having data underrun set BUT! having the residual byte count be non-zero. In this case the driver questions the validity of all information given that the various components of the status don't jibe with one another. The status is essentially saying it was successful but didn't transfer all the data requested... that is not the definition of success.

Its possible when storage returns the Medium Error it doesn't properly set up all other data fields and the driver detects a mismatch within the data.

In the case of the hostbyte being returned as invalid:

kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK

This is because the 0x10 hostbyte is not decoded within the set_host_byte() function in RHEL until RHEL5.4 with the addition of the following host byte codes:


#define DID_TRANSPORT_DISRUPTED 0x0e /* Transport error disrupted execution and the driver blocked the port to recover the link. Transport class will retry or fail IO */
#define DID_TRANSPORT_FAILFAST  0x0f /* Transport class fastfailed the io */
#define DID_TARGET_FAILURE      0x10 /* Permanent target failure, do not retry on other paths */
#define DID_NEXUS_FAILURE       0x11  /* Permanent nexus failure, retry on other paths might yield different results */

Neither the DID_ERROR, nor invalid hostbyte, nor changes in Add. Sense affect the interpretation of the Medium Error sense key.

SBR

Storage

Product(s)

Red Hat Enterprise Linux

Components

kernel

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.