GFS or GFS2 file system withdraws after temporary failure of all paths in multipath map in RHEL
Environment
- Red Hat Enterprise Linux (RHEL) 5 or 6 with the Resilient Storage Add On
- GFS or GFS2 file systems
- device-mapper-multipath
Issue
- How can I avoid a GFS/GFS2 withdrawl when the paths in my multipath map go down for a short amount of time?
- Experienced a failure on the storage array which lead to all paths in the multipath device failing temporarily, and the GFS or GFS2 file system residing on it withdrew
Nov 20 01:59:12 node1 kernel: end_request: I/O error, dev dm-3, sector 0
Nov 20 01:59:12 node1 kernel: end_request: I/O error, dev dm-3, sector 4295039024
Nov 20 01:59:12 node1 kernel: __ratelimit: 12614 callbacks suppressed
Nov 20 01:59:12 node1 kernel: Buffer I/O error on device dm-5, logical block 8710
Nov 20 01:59:12 node1 kernel: lost page write due to I/O error on dm-5
Nov 20 01:59:12 node1 kernel: end_request: I/O error, dev dm-3, sector 9232768336
Nov 20 01:59:12 node1 kernel: GFS2: fsid=mycluster:gfs2.0: fatal: I/O error
Nov 20 01:59:12 node1 kernel: GFS2: fsid=mycluster:gfs2.0: block = 8710
Nov 20 01:59:12 node1 kernel: GFS2: fsid=mycluster:gfs2.0: function = log_write_header, file = fs/gfs2/log.c, line = 616
Nov 20 01:59:12 node1 kernel: GFS2: fsid=mycluster:gfs2.0: about to withdraw this file system
Nov 20 01:59:12 node1 kernel: end_request: I/O error, dev dm-3, sector 9232768336
Resolution
Configure the multipath maps in question with a no_path_retry greater than 0. The DM Multipath guide for This content is not included.RHEL 5 and This content is not included.RHEL 6 contain instructions for configuring device-specific attributes on multipath maps.
Root Cause
When all paths in a multipath map fail, the resulting behavior is dictated by the no_path_retry and queue_if_no_path feature. If neither of these is set, or no_path_retry is set to fail, then as soon as all paths are marked as failed, any outstanding I/O to that map is returned with an error to the higher layers (such as the file system).
In the case of GFS or GFS2, if it receives an I/O error from the underlying device (in this case, the multipath map) it can result in a withdrawl to prevent corruption of data. In situations where all paths fail, but only for a short period of time, it can be helpful to have a no_path_retry value greater than 0, so that the I/O from GFS/GFS2 is retried multiple times before determining that an error has occurred and a withdrawl is needed.
Diagnostic Steps
- Determine if the withdrawl is preceded by multipath path failures and/or I/O errors in
/var/log/messages. If so, consider implementing the above resolution to avoid withdrawls in the future.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.