Is there a watchdog script for fence_mpath to reboot a RHEL High Availability or Resilient Storage cluster node?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7, 8, 9, 10 (with the High Availability Add-on)

Issue

  • Is there a watchdog script for Red Hat Enterprise Linux Server (with the High Availability or Resilient Storage Add-Ons) to reboot a cluster node that has been fenced by fence_mpath?

Resolution

Configuration of the Watchdog Service for fence_mpath in RHEL 7

This configuration assumes that you are using the fence_mpath agent and it is correctly configured as a stonith device in your Pacemaker cluster.

NOTE: Ensure the system is up to date in order to avoid a known bug that prevents the watchdog script from working when SELinux is in Enforcing mode.

  1. Install fence-agents-4.2.1-11.el7 or later, which has the watchdog feature for fence_mpath.

     # yum -y install fence-agents-4.2.1-11.el7
    
  2. Install the watchdog package.

     # yum -y install watchdog
    
  3. For a hard reboot, create a wrapper executable in the /etc/watchdog.d directory to call /usr/share/cluster/fence_mpath_check_hardreboot. (Note: a previous version of this solution contained an instruction to create a symlink to /usr/share/cluster/fence_mpath_check_hardreboot. This can result in AVC denials from SELinux, due to execution under the fenced_t context instead of the watchdog_unconfined_t context.)

     # cat > /etc/watchdog.d/fence_mpath_check_hardreboot << EOF
     #!/bin/sh
     exec /usr/share/cluster/fence_mpath_check_hardreboot "\${@:-}"
     EOF
     
     # chmod +x /etc/watchdog.d/fence_mpath_check_hardreboot
    
  4. Enable and start the watchdog service.

     # systemctl enable watchdog
     # systemctl start watchdog
    
  5. Test fencing and ensure that the node reboots and that it is unfenced appropriately the next time the cluster starts there.

Root Cause

With the release of RHEL 7.6, fence_mpath was integrated with watchdog so that a support software watchdog timer can reboot a cluster node when it is fenced by fence_mpath. Earlier versions of fence_mpath does not have watchdog feature.

The package watchdog is a general timer service available in RHEL that can be used to periodically monitor system resources. Fence agents have now been integrated with watchdog such that the watchdog service can reboot a cluster node after it has been fenced using fence_mpath. This eliminates the need for manual intervention to reboot the cluster node after it has been fenced using fence_mpath.

The purpose of the watchdog script, fence_mpath_check_hardreboot, is for a node to reboot itself when it has been fenced via the fence_mpath agent. Use of this script is optional and disabled by default.

Similar functionality is offered for fence_scsi and can be found here:

Additional Resources
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.