fence_scsi_check.pl watchdog script does a soft reboot instead of hard and hangs during shutdown in a RHEL 6 or 7 Resilient Storage cluster with GFS2

Solution Unverified - Updated 6 Aug 2024

Environment

Red Hat Enterprise Linux (RHEL) 6 or 7 with the Resilient Storage Add On
GFS2
Using SCSI Persistent Reservation Fencing (fence_scsi)
Using the fence_scsi_check.pl watchdog script for fence_scsi to reboot a node when fenced
- RHEL 7:
  - Using a fence-agents-scsi release prior to 4.0.11-27.el7_2.5, OR
  - Using fence-agents-scsi-4.0.11-27.el7_2.5 or later AND /etc/watchdog.d/fence_scsi_check is in place (as opposed to /etc/watchdog.d/fence_scsi_check_hardreboot)
- RHEL 6:
  - Using a fence-agents release prior to 3.1.5-48.el6, OR
  - Using fence-agents-3.1.5-48.el6 or later AND /usr/share/cluster/fence_scsi_check.pl is linked or copied to /etc/watchdog.d (as opposed to /usr/share/cluster/fence_scsi_check_hardreboot.pl being linked or copied)

Issue

scsi fencing watchdog does not hard reset a node
The reboot that watchdog does is "soft" so things can hang during the shutdown
When a node is fenced by fence_scsi, it seems to get stuck on the way down and never reboots. There are hung task warnings on the console showing processes blocked waiting on GFS2.

Resolution

RHEL 7

Update to fence-agents-scsi-4.0.11-27.el7_2.5 or later, and instead of creating a link from /usr/share/cluster/fence_scsi_check to /etc/watchdog.d/fence_scsi_check, create the link to /etc/watchdog.d/fence_scsi_check_hardreboot.

# rm /etc/watchdog.d/fence_scsi_check
# ln -s /usr/share/cluster/fence_scsi_check /etc/watchdog.d/fence_scsi_check_hardreboot

RHEL 6

Update to fence-agents-3.1.5-48.el6 or later, and switch from using /usr/share/cluster/fence_scsi_check.pl to /usr/share/cluster/fence_scsi_check_hardreboot.pl. This would require removing /etc/watchdog.d/fence_scsi_check.pl (which is usually a symlink back to the formerly listed script), and creating a new symlink:

# rm /etc/watchdog.d/fence_scsi_check.pl
# ln -s /usr/share/cluster/fence_scsi_check_hardreboot.pl /etc/watchdog.d/

All Releases

Workaround: Do all of the following:

Remove /etc/watchdog.d/fence_scsi_check.pl, or disable the watchdog service from starting on boot
Configure all GFS2 file systems with mount option err="panic" to cause the node to panic if any GFS2 file system encounters a fatal error, such as an I/O error caused by a SCSI reservation conflict that will occur after the node is fenced
Configure at least one of the following:
+Set up kdump to capture a core when a node panics, which will then reboot the host when dumping is done.
+Set kernel.panic to a value greater than 0 in /etc/sysctl.conf, so the node will reboot that many seconds after it panics

Root Cause

A resolution to this problem for RHEL 6 was released in an updated fence-agents package by Red Hat in Bugzilla #1050022, and in a RHEL 7 Update 2 asynchronous erratum via Bugzilla #1292071. Red Hat is further pursuing a release in a minor release of RHEL 7 via Bugzilla #1265426.

The watchdog daemon uses its own internal shutdown procedure when one of the test scripts fails, and this procedure involves unmounting all file systems. In the case where a node has been fenced by fence_scsi, it will generally either be:

Attempting to fence the other node in the cluster, and failing repeatedly because it was fenced first, OR
Inquorate because it cannot reach the other cluster members

In either case, attempting to unmount the GFS2 file system will simply block, thus preventing watchdog from completing the shutdown.

The new fence-agents package for RHEL 6 mentioned above includes a separate fence_scsi_check_hardreboot.pl script which triggers a reboot that will not go through the normal graceful shutdown routine in watchdog but instead simply hard reboots the system. In the new package in RHEL 7 the /usr/share/cluster/fence_scsi_check script (which is simply a copy of /usr/sbin/fence_scsi) contains code that will cause the system to hard-reboot when fenced if the script that is executed by watchdog is named fence_scsi_check_hardreboot. So in other words, the same script can be used to trigger standard watchdog shutdown facilities or to trigger a hard-reboot, depending on whether it is named /etc/watchdog.d/fence_scsi_check or /etc/watchdog.d/fence_scsi_check_hardreboot.

Diagnostic Steps

Set up fence_scsi_check.pl and have GFS2 file systems mounted, then cause a node to be fenced (pull its network cables, down a switch port, pause its corosync process, etc). Once it attempts to reboot and gets stuck, for a kernel panic via SysRq+C or through a diagnostic NMI from the system management card, causing it to dump a core that can be captured by kdump. Review that core and inspect the backtrace of the watchdog process and see if it is stuck in gfs2 somewhere that appears to be an attempt to unmount a file system, like this in RHEL 7 for example:

crash> bt 4062
PID: 4062   TASK: ffff8808112b16c0  CPU: 18  COMMAND: "watchdog"
 #0 [ffff88081239b978] __schedule at ffffffff816092dd
 #1 [ffff88081239b9e0] schedule at ffffffff81609839
 #2 [ffff88081239b9f0] rwsem_down_read_failed at ffffffff8160b315
 #3 [ffff88081239ba60] call_rwsem_down_read_failed at ffffffff812e2d94
 #4 [ffff88081239bac0] dlm_lock at ffffffffa0679658 [dlm]
 #5 [ffff88081239bb90] gdlm_lock at ffffffffa06f5137 [gfs2]
 #6 [ffff88081239bc18] do_xmote at ffffffffa06d7359 [gfs2]
 #7 [ffff88081239bc78] run_queue at ffffffffa06d7580 [gfs2]
 #8 [ffff88081239bcb8] gfs2_glock_nq at ffffffffa06d7a0d [gfs2]
 #9 [ffff88081239bd08] gfs2_statfs_sync at ffffffffa06f2003 [gfs2]
#10 [ffff88081239bd98] gfs2_make_fs_ro at ffffffffa06f21bf [gfs2]
#11 [ffff88081239be08] gfs2_put_super at ffffffffa06f26e8 [gfs2]
#12 [ffff88081239be40] generic_shutdown_super at ffffffff811c8cf6
#13 [ffff88081239be68] kill_block_super at ffffffff811c8fd7
#14 [ffff88081239be88] gfs2_kill_sb at ffffffffa06e1512 [gfs2]
#15 [ffff88081239bea8] deactivate_locked_super at ffffffff811c930d
#16 [ffff88081239bec8] deactivate_super at ffffffff811c9916
#17 [ffff88081239bee0] mntput_no_expire at ffffffff811e6795
#18 [ffff88081239bf08] sys_umount at ffffffff811e78cf
#19 [ffff88081239bf80] system_call_fastpath at ffffffff81614389
    RIP: 00007efdbbc07247  RSP: 00007fff4160a2d8  RFLAGS: 00000246
    RAX: 00000000000000a6  RBX: ffffffff81614389  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 000000000117f820
    RBP: 0000000000000000   R8: 0000000000000001   R9: 0000000000000000
    R10: 00007fff4160a050  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000000001  R14: 000000000117f7f0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b

Repeat any testing without GFS2 or device-mapper-multipath and confirm the issue is no longer reproduceable, demonstrating that GFS2 and device-mapper-multipath are responsible for blocking the shutdown via watchdog.

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cluster

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.