fence_scsi_check.pl watchdog script does a soft reboot instead of hard and hangs during shutdown in a RHEL 6 or 7 Resilient Storage cluster with device-mapper-multipath
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
- Using SCSI Persistent Reservation Fencing (
fence_scsi) - Using the
fence_scsi_check.plwatchdog script forfence_scsito reboot a node when fenced- RHEL 7:
- Using a
fence-agents-scsirelease prior to4.0.11-27.el7_2.5, OR - Using
fence-agents-scsi-4.0.11-27.el7_2.5or later AND/etc/watchdog.d/fence_scsi_checkis in place (as opposed to/etc/watchdog.d/fence_scsi_check_hardreboot)
- Using a
- RHEL 6:
- Using a
fence-agentsrelease prior to3.1.5-48.el6, OR - Using
fence-agents-3.1.5-48.el6or later AND/usr/share/cluster/fence_scsi_check.plis linked or copied to/etc/watchdog.d(as opposed to/usr/share/cluster/fence_scsi_check_hardreboot.plbeing linked or copied)
- Using a
- RHEL 7:
device-mapper-multipath- The settings for the device in question enable queueing (even if only temporary) when all paths have failed
- Can be enabled via
no_path_retryset to "queue" or a value greater than 0 in/etc/multipath.conf, or in the built-in device settings inmultipathd(see/usr/share/doc/device-mapper-multipath-$vers/multipath.conf.defaults) - Can be enabled via
features "1 queue_if_no_path"in/etc/multipath.confor built-in device settings inmultipathdifno_path_retryis not set.
- Can be enabled via
- The settings for the device in question enable queueing (even if only temporary) when all paths have failed
Issue
- After manually fencing a node with actively running a resource group, scsi watchdog begins to initiate a reboot but fails to completely reboot the machine.
- When watchdog reboots a node, it gets stuck shutting down. I see backtraces with it waiting on device mapper or the file system
Resolution
RHEL 7
Update to fence-agents-scsi-4.0.11-27.el7_2.5 or later, and instead of creating a link from /usr/share/cluster/fence_scsi_check to /etc/watchdog.d/fence_scsi_check, create the link to /etc/watchdog.d/fence_scsi_check_hardreboot.
# rm /etc/watchdog.d/fence_scsi_check
# ln -s /usr/share/cluster/fence_scsi_check /etc/watchdog.d/fence_scsi_check_hardreboot
RHEL 6
- Update to
fence-agents-3.1.5-48.el6or later, and switch from using/usr/share/cluster/fence_scsi_check.plto/usr/share/cluster/fence_scsi_check_hardreboot.pl. This would require removing/etc/watchdog.d/fence_scsi_check.pl(which is usually a symlink back to the formerly listed script), and creating a new symlink:
# rm /etc/watchdog.d/fence_scsi_check.pl
# ln -s /usr/share/cluster/fence_scsi_check_hardreboot.pl /etc/watchdog.d/
All Releases
- Workaround: Configure the device to fail immediately on the loss of all paths. Example snippet from
/etc/multipath.conf(the exact configuration can vary per environment):
devices {
device {
vendor "EXAMPLE"
no_path_retry fail
features "0"
}
}
Or alternatively, the no_path_retry can be kept at a value > 0, but a small value.
Root Cause
A resolution to this problem in RHEL 6 was released in an updated fence-agents package by Red Hat in Bugzilla #1050022. and in a RHEL 7 Update 2 asynchronous erratum via Bugzilla #1292071. Red Hat is further pursuing a release in a minor release of RHEL 7 via Bugzilla #1265426.
When a node is fenced by fence_scsi, subsequent write attempts to those devices controlled by the agent should return a SCSI reservation conflict error, propagating on all paths and resulting in that multipath map having no more remaining active paths. When no paths are remaining and no_path_retry is set to "queue" or a value greater than 0, then that I/O will continue to be queued until the retries are exhausted.
All the while, if the watchdog daemon is configured to reboot the host when its reservation is revoked, such as with fence_scsi_check.pl, then watchdog will start that process potentially before multipathd has flipped the map to stop queueing. As part of watchdog's shutdown process, it will kill any running processes, including multipathd. With multipathd dead, the map will never stop queueing blocked I/O, and watchdog will continue to wait for pending I/O to flush to disk, and thus the shutdown process may never complete.
The new fence-agents package in RHEL 6 mentioned above includes a separate fence_scsi_check_hardreboot.pl script which triggers a reboot that will not go through the normal graceful shutdown routine in watchdog but instead simply hard reboots the system. In the new package for RHEL 7, the /usr/share/cluster/fence_scsi_check script (which is simply a copy of /usr/sbin/fence_scsi) contains code that will cause the system to hard-reboot when fenced if the script that is executed by watchdog is named fence_scsi_check_hardreboot. So in other words, the same script can be used to trigger standard watchdog shutdown facilities or to trigger a hard-reboot, depending on whether it is named /etc/watchdog.d/fence_scsi_check or /etc/watchdog.d/fence_scsi_check_hardreboot.
Diagnostic Steps
- Configure
kdump, reproduce the hang on shutdown when triggered by watchdog, and then force a core dump with Sysrq-C to capture a core. Review the core and see that multipathd is dead, and see in the logs that processes are blocked waiting on I/O from a device that never completed its retries while queueing
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.