stonith-timeout doesn't work as expected in a RHEL 6 or 7 High Availability cluster with pacemaker

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
  • Pacemaker
  • One or more stonith devices with stonith-timeout set as an attribute

Issue

  • I've set stonith-timeout as an attribute on my stonith device, but I'm still seeing timeouts that are shorter than the value I set
  • stonith-timeout doesn't seem to work on a per-device basis like the manpages and docs say it should
  • Why doesn't stonith-timeout apply correctly to my cluster's stonith devices? I have to set the cluster property stonith-timeout for all of them.
  • I've set the stonith-timeout property to a higher value, but monitor actions are timing out
  • Does stonith-timeout apply to monitor, list, and status actions? Or just to fencing a node (i.e., reboot).
  • My cluster logs errors from my stonith device regarding a parse error from stonith-timeout:
Aug 31 08:23:39 node1 stonith-ng[2008]:  warning: log_operation: vmfence1:6167 [ Parse error: Ignoring unknown option 'stonith-timeout=120' ]

Resolution

Use one of the timeout options documented in the following solution: A stonith device is failing to start and/or reporting "Timed Out" errors in a RHEL High Availability cluster with pacemaker.

Root Cause

stonith-timeout was documented in several places as a per-device attribute with the ability to override the cluster-wide stonith-timeout property. However, it is not actually implemented as a per-device attribute and won't control any timeouts when configured as one.

The stonith-timeout cluster property does apply to all devices, but it only controls the timeouts for on, off, and reboot actions. This setting does not control other actions like monitor, status, and list. This was not made clear in some older versions of the documentation.

The timeouts for individual actions on individual devices can be configured with the pcmk_<action>_timeout settings (replacing <action> with the appropriate action name), as described in Table 5.2. Advanced Properties of Fencing Devices.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.