How can I prevent my filesystems on multipath devices from entering read-only mode after storage errors?

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux
  • device-mapper-multipath
  • Fibre Channel SAN storage

Issue

  • I am experiencing storage-side errors which are causing my multipath-ed filesystems to become read-only due to path failure.
  • This causes that the filesystem must be unmounted and mounted again in order to recover it's read-write status, once the paths are recovered.
  • While fixing these errors, I need to prevent the filesystems from entering read-only mode.

Resolution

There are three options that can be configured/modified in order to prevent a filesystem from going into read-only mode. These can be used to lengthen io timeouts to help prevent filesystems becoming read-only. These are described in detail in Red Hat's documentation on Multipath's configuration file parameters as follows:

  1. features "1 queue_if_no_path"
    If features "1 queue_if_no_path" is specified in the /etc/multipath.conf file, then any process that issues I/O will hang until one or more paths are restored.

    This option must be configured per device type, in the relevant section:

     devices {
                     device {
                         [...]
                         features                "1 queue_if_no_path"
                         [...]
             }
     }
    
  2. fast_io_fail_tmo
    The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote port before failing I/O to devices on that remote port. This value should be smaller than the value of dev_loss_tmo. Setting this to off will disable the timeout. The default value is determined by the OS.

  3. dev_loss_tmo
    The number of seconds the SCSI layer will wait after a problem has been detected on an FC remote port before removing it from the system. Setting this to infinity will set this to 2147483647 seconds, or 68 years. The default value is determined by the OS.

Always remember to restart multipath after performing these changes. Also if booting from SAN, rebuild initramfs so updated multipath.conf settings are present at boot time.

    [root@host ~]# service multipathd restart

Please be very careful when modifying these parameters. For 1 queue_if_no_path, keep in mind that if the problems with the failing paths persist for more than a couple of minutes, this can cause hung task soft panics and increase your system load. Regarding fast_io_fail_tmo and dev_loss_tmo, these parameters are set by the OS and should not be changed lightly. Modifying these parameters under these situations can lead to filesystem corruption.

Above all, prioritize fixing your storage issues and only resort to these measures if there is no other alternative. If in doubt, do not hesitate This content is not included.to open a support case.

Also, see the following on additional details for shortening timeout failover to surviving paths in a fibre channel environment:

Root Cause

device-mapper-multipath will, as a safety measure, put your filesystems in read-only mode when there is a serious path failure (e.g. all paths to a device down). By design, the filesystem cannot then be remounted with:

[root@host ~] mount -o rw,remount /filesystem

This is because there might be some corruption present in the filesystem, and generally an fsck is needed after these situations.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.