Why am I seeing it take 300s or more to fail a device-mapper-multipath path with a storage array using ALUA in RHEL 5?
Environment
- Red Hat Enterprise Linux (RHEL) 5
- device-mapper-multipath
- Storage array that utilizes ALUA access mode
Issue
-
Servers connected to a storage array are seeing aborted commands and 300s SCSI command timeouts in /var/log/messages.
-
We are experiencing storage path failures that take upwards of 300 seconds for device-mapper-multipath to start using the next path
-
Errors similar to:
Oct 24 22:53:57 example kernel: qla2xxx 0000:0f:00.0: scsi(4:1:32): Abort command issued -- 1 916a951 2002. Oct 24 22:53:57 example kernel: sd 4:0:1:32: timing out command, waited 300s Oct 24 22:53:57 example multipathd: /sbin/mpath_prio_alua exitted with 5 Oct 24 22:53:57 example multipathd: error calling out /sbin/mpath_prio_alua /dev/sdfw
Resolution
It is recommended to update to device-mapper-multipath-0.4.7-48.el5 or later, which has a change that will reduce the timeout of mpath_prio_rdac to 60 seconds, down from 300.
The underlying cause for this issue is that a storage target is unresponsive, and the 300 second timeout is a side-effect. To properly correct this issue and avoid storage disruptions, you should check your storage array, switches, and hosts on the fabric for any potential issues. Some problems that have been known to contribute to these unresponsive targets are:
- CRC error on storage ports
- ISL-Buffer handling
- Bottlenecks and slow draining devices on fabric
- Different bandwidths on trunks
- Out-of-date firmware on switches
It is recommended that you contact your storage vendor for assistance with diagnosing these unresponsive targets.
Root Cause
The issue here is that I/O commands are timing out, which causes the error handler to kick in (this is where the command aborts come from), and eventually the device exceeds its timeout and fails the path. The time it takes to completely time out a device varies based on several factors, one of which is the longest timeout set on any command at the time the first command times out. See this article for more details on device timeouts.
The problem is that device-mapper-multipath's mpath_prio_alua priority callout for ALUA-based devices sets a 300 second I/O timeout on its callouts. This means that if a device becomes unresponsive while multipathd has a priority callout outstanding, it can cause the path failover to take much longer, causing problems for applications that expect a response in a short amount of time (such as cluster products using a quorum / voting disk). This issue is being tracked in Red Hat Bugzilla #737072.
Even if the mpath_prio_alua I/O timeout can be lowered, lengthy delays may still be experienced while waiting for the device to timeout. As such, it is recommended that you investigate the cause of the unresponsive devices on the fabric.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.