fence_scsi_check.pl watchdog script does a soft reboot instead of hard and hangs during shutdown in a RHEL 6 or 7 Resilient Storage cluster with GFS2
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the Resilient Storage Add On
- GFS2
- Using SCSI Persistent Reservation Fencing (
fence_scsi) - Using the
fence_scsi_check.plwatchdog script forfence_scsito reboot a node when fenced- RHEL 7:
- Using a
fence-agents-scsirelease prior to4.0.11-27.el7_2.5, OR - Using
fence-agents-scsi-4.0.11-27.el7_2.5or later AND/etc/watchdog.d/fence_scsi_checkis in place (as opposed to/etc/watchdog.d/fence_scsi_check_hardreboot)
- Using a
- RHEL 6:
- Using a
fence-agentsrelease prior to3.1.5-48.el6, OR - Using
fence-agents-3.1.5-48.el6or later AND/usr/share/cluster/fence_scsi_check.plis linked or copied to/etc/watchdog.d(as opposed to/usr/share/cluster/fence_scsi_check_hardreboot.plbeing linked or copied)
- Using a
- RHEL 7:
Issue
- scsi fencing watchdog does not hard reset a node
- The reboot that watchdog does is "soft" so things can hang during the shutdown
- When a node is fenced by
fence_scsi, it seems to get stuck on the way down and never reboots. There are hung task warnings on the console showing processes blocked waiting on GFS2.
Resolution
RHEL 7
Update to fence-agents-scsi-4.0.11-27.el7_2.5 or later, and instead of creating a link from /usr/share/cluster/fence_scsi_check to /etc/watchdog.d/fence_scsi_check, create the link to /etc/watchdog.d/fence_scsi_check_hardreboot.
# rm /etc/watchdog.d/fence_scsi_check
# ln -s /usr/share/cluster/fence_scsi_check /etc/watchdog.d/fence_scsi_check_hardreboot
RHEL 6
Update to fence-agents-3.1.5-48.el6 or later, and switch from using /usr/share/cluster/fence_scsi_check.pl to /usr/share/cluster/fence_scsi_check_hardreboot.pl. This would require removing /etc/watchdog.d/fence_scsi_check.pl (which is usually a symlink back to the formerly listed script), and creating a new symlink:
# rm /etc/watchdog.d/fence_scsi_check.pl
# ln -s /usr/share/cluster/fence_scsi_check_hardreboot.pl /etc/watchdog.d/
All Releases
Workaround: Do all of the following:
- Remove
/etc/watchdog.d/fence_scsi_check.pl, or disable thewatchdogservice from starting on boot - Configure all GFS2 file systems with mount option
err="panic"to cause the node to panic if any GFS2 file system encounters a fatal error, such as an I/O error caused by a SCSI reservation conflict that will occur after the node is fenced - Configure at least one of the following:
+Set upkdumpto capture a core when a node panics, which will then reboot the host when dumping is done.
+Setkernel.panicto a value greater than 0 in/etc/sysctl.conf, so the node will reboot that many seconds after it panics
Root Cause
A resolution to this problem for RHEL 6 was released in an updated fence-agents package by Red Hat in Bugzilla #1050022, and in a RHEL 7 Update 2 asynchronous erratum via Bugzilla #1292071. Red Hat is further pursuing a release in a minor release of RHEL 7 via Bugzilla #1265426.
The watchdog daemon uses its own internal shutdown procedure when one of the test scripts fails, and this procedure involves unmounting all file systems. In the case where a node has been fenced by fence_scsi, it will generally either be:
- Attempting to fence the other node in the cluster, and failing repeatedly because it was fenced first, OR
- Inquorate because it cannot reach the other cluster members
In either case, attempting to unmount the GFS2 file system will simply block, thus preventing watchdog from completing the shutdown.
The new fence-agents package for RHEL 6 mentioned above includes a separate fence_scsi_check_hardreboot.pl script which triggers a reboot that will not go through the normal graceful shutdown routine in watchdog but instead simply hard reboots the system. In the new package in RHEL 7 the /usr/share/cluster/fence_scsi_check script (which is simply a copy of /usr/sbin/fence_scsi) contains code that will cause the system to hard-reboot when fenced if the script that is executed by watchdog is named fence_scsi_check_hardreboot. So in other words, the same script can be used to trigger standard watchdog shutdown facilities or to trigger a hard-reboot, depending on whether it is named /etc/watchdog.d/fence_scsi_check or /etc/watchdog.d/fence_scsi_check_hardreboot.
Diagnostic Steps
- Set up
fence_scsi_check.pland have GFS2 file systems mounted, then cause a node to be fenced (pull its network cables, down a switch port, pause itscorosyncprocess, etc). Once it attempts to reboot and gets stuck, for a kernel panic via SysRq+C or through a diagnostic NMI from the system management card, causing it to dump a core that can be captured by kdump. Review that core and inspect the backtrace of thewatchdogprocess and see if it is stuck ingfs2somewhere that appears to be an attempt to unmount a file system, like this in RHEL 7 for example:
crash> bt 4062
PID: 4062 TASK: ffff8808112b16c0 CPU: 18 COMMAND: "watchdog"
#0 [ffff88081239b978] __schedule at ffffffff816092dd
#1 [ffff88081239b9e0] schedule at ffffffff81609839
#2 [ffff88081239b9f0] rwsem_down_read_failed at ffffffff8160b315
#3 [ffff88081239ba60] call_rwsem_down_read_failed at ffffffff812e2d94
#4 [ffff88081239bac0] dlm_lock at ffffffffa0679658 [dlm]
#5 [ffff88081239bb90] gdlm_lock at ffffffffa06f5137 [gfs2]
#6 [ffff88081239bc18] do_xmote at ffffffffa06d7359 [gfs2]
#7 [ffff88081239bc78] run_queue at ffffffffa06d7580 [gfs2]
#8 [ffff88081239bcb8] gfs2_glock_nq at ffffffffa06d7a0d [gfs2]
#9 [ffff88081239bd08] gfs2_statfs_sync at ffffffffa06f2003 [gfs2]
#10 [ffff88081239bd98] gfs2_make_fs_ro at ffffffffa06f21bf [gfs2]
#11 [ffff88081239be08] gfs2_put_super at ffffffffa06f26e8 [gfs2]
#12 [ffff88081239be40] generic_shutdown_super at ffffffff811c8cf6
#13 [ffff88081239be68] kill_block_super at ffffffff811c8fd7
#14 [ffff88081239be88] gfs2_kill_sb at ffffffffa06e1512 [gfs2]
#15 [ffff88081239bea8] deactivate_locked_super at ffffffff811c930d
#16 [ffff88081239bec8] deactivate_super at ffffffff811c9916
#17 [ffff88081239bee0] mntput_no_expire at ffffffff811e6795
#18 [ffff88081239bf08] sys_umount at ffffffff811e78cf
#19 [ffff88081239bf80] system_call_fastpath at ffffffff81614389
RIP: 00007efdbbc07247 RSP: 00007fff4160a2d8 RFLAGS: 00000246
RAX: 00000000000000a6 RBX: ffffffff81614389 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000117f820
RBP: 0000000000000000 R8: 0000000000000001 R9: 0000000000000000
R10: 00007fff4160a050 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000001 R14: 000000000117f7f0 R15: 0000000000000000
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
- Repeat any testing without GFS2 or
device-mapper-multipathand confirm the issue is no longer reproduceable, demonstrating that GFS2 anddevice-mapper-multipathare responsible for blocking the shutdown viawatchdog.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.