How can I improve the failover time of a faulty path when using device-mapper-multipath over iSCSI in Red Hat Enterprise Linux?

Updated

When accessing SAN storage over iSCSI, it is often desirable for failing connections to be retried multiple times before passing the I/O error back to the application layer. However, on systems using device-mapper-multipath it is usually preferable to have the SCSI layer quickly fail an errant path and pass the error back up to device-mapper. This way I/O can be routed to the alternate paths rather than being queued for long periods of time. Therefore it is recommended that several settings for the iscsi-initiator-utils package be adjusted in /etc/iscsi/iscsid.conf to decrease the amount of time required for failover.

The total time it takes to complete a failover is defined as:

Failover Time = nop timeout + nop interval + replacement_timeout

To quickly detect problems in the network, the iSCSI layer will send iSCSI pings (iSCSI NOP-Out requests) to the target. If a NOP-Out times out, the SCSI layer will fail the command to the multipath layer instead of retrying and it will be tried on another path. The following settings are recommended:

node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 10

The replacement_timeout setting will control how long to wait for session re-establishment before failing pending SCSI commands up to multipath:

node.session.timeo.replacement_timeout = 15

If there are a lot of I/O errors, then the above values may be too aggressive and it may be necessary to increase the values for specific network conditions and workloads. Otherwise, the network may need to be checked for problems.

By default, device-mapper-multipath does not queue I/O commands when there are no remaining paths left. This means that if there is a short network failure affecting all paths, access to the filesystem or data may be lost rather than waiting for the connection to return. To queue I/O until the connection returns, add this statement to the device section in /etc/multipath.conf:

features                "1 queue_if_no_path"

More information is available in the following documents:

Note: While dev_loss_tmo and fast_io_fail_tmo multipath attributes are specifically defined within the Fibre Channel context, it will translate the multipath fast_io_fail_tmo as being the same as iscsi recovery_tmo setting. The dev_loss_tmo is ignored for iscsi as no clear equilavent is present in iscsi environments.
Note: iscsid sets the iSCSI session recovery_tmo to the value of replacement_timeout (as configured in the session parameters) during iSCSI login. If multipath has a value set for fast_io_fail_tmo, then it will overwrite the iscsi recovery_tmo configured by iscsid.

SBR
Category