Fencing fails in a RHEL 7, 8, 9 High Availability cluster because systemd initiates a graceful shutdown
Environment
- Red Hat Enterprise Linux (RHEL) 7, 8, 9 with the High Availability Add-On
- One or more
pacemaker cluster nodes (orpacemakerremote nodes) associated with astonith` device that uses a power-method which connects to a BMC or system-management controller like an iLO, RSA, DRAC, iDRAC, etc.
Issue
- fencing fails because
systemd-logindhandles the "power button" signal and initiates a graceful shutdown instead of powercycling the system. - When a node fenced the other, we see that node process a power-button press and starts to shut down. All the while, fencing fails on the other node, seemingly for taking too long
- Do we need to disable acpi / acpid in RHEL 7 clusters like we did in previous releases?
- Do I need to do anything in addition to disabling ACPI on RHEL 7 cluster nodes to avoid it softly shutting down? For example:
Aug 13 21:07:22 node01 systemd-logind: Power key pressed.
Aug 13 21:07:22 node01 systemd-logind: Powering Off...
Aug 13 21:07:22 node01 systemd-logind: System is powering down.
Aug 13 21:07:42 node02 stonith-ng[2803]: notice: log_operation: Operation 'reboot' [3114] for device 'node01-ilo' returned: -62 (Timer expired)
- A cluster node gracefully rebooted instead of being hard killed on RHEL 7:
Nov 2 10:57:01 node41 stonith-ng[8161]: notice: Operation reboot of node42 by node42 for crmd.20238@uxplpsgrd03.8b66209c: OK
Nov 2 10:57:01 node42 crmd[20238]: crit: We were allegedly just fenced by node41 for node42!
Nov 2 10:57:01 node42 stonith-ng[20234]: notice: Operation reboot of node42 by node41 for crmd.20238@node42.8b66209c: OK
Nov 2 10:57:01 node42 systemd-logind: Power key pressed.
- A cluster node gracefully rebooted instead of being hard killed on RHEL 8:
Sep 18 16:19:11 rhel8-1 stonith-ng[8161]: notice: Operation reboot of rhel8-1 by rhel8-2 for crmd.20238@uxplpsgrd03.8b66209c: OK
Sep 18 16:19:11 rhel8-1 crmd[20238]: crit: We were allegedly just fenced by rhel8-1 for rhel8-2!
Sep 18 16:19:11 rhel8-1 systemd-logind[792]: Session 1 logged out. Waiting for processes to exit.
Sep 18 16:19:11 rhel8-1 systemd-logind[792]: Removed session 1.
Resolution
When a pacemaker cluster node (or pacemaker remote node) is fenced a hard kill should occur and not a graceful shutdown of the operating system. If a graceful shutdown is occurring then the following should be done for the operating system to ignore any power-button-pressed signal that are received.
Disable power-key handling through /etc/systemd/logind.conf and define the following configuration:
HandlePowerKey=ignore
Then restart logind service:
# systemctl restart systemd-logind.service
For more information then see the following:
- RHEL 7 - 5.12. Configuring ACPI For Use with Integrated Fence Devices
- RHEL 8 - 9.13. Configuring ACPI for use with integrated fence devices
- RHEL 9 - 10.14 Configuring ACPI for use with integrated fence devices
Root Cause
When a High Availability cluster is configured with a stonith agent that connects to a node's BMC or management interface - such as an HP iLO, Dell DRAC, IBM RSA, or similar - then it may send out a power-button-pressed signal to the host to see if it responds. In some situations this might be a desirable feature, so that a host has a chance to shut-down gracefully, cleaning up its ongoing work and syncing data out to disk, before powering off. However, in the case of a Highly Available node, the desire should be to have that node powered off immediately, so that it cannot continue to interact with shared resources without knowledge or coordination of other nodes in the cluster.
When such a signal is received by a RHEL 7 host, the systemd-logind daemon can trap the power-key pressed event and initiate a graceful shutdown, instead of instantly powercycling the machine. This delay can result in failed fencing, usually due to a timeout error. Disabling this handling in systemd-logind should cause the host to ignore such a signal, and thus the BMC should power the server off immediately rather than wait for it to shutdown.
In addition if power fencing is used then in some cases the following message will be printed that indicates that the node that was fenced is alive when it should not be. In this instance the fencing device told the cluster node to shut down instead of hard killing the node which resulted in graceful shutdown as described in this solution.
Nov 2 10:57:01 node42 crmd[20238]: crit: We were allegedly just fenced by node41 for node42!
Nov 2 10:57:01 node42 systemd-logind: Power key pressed.
With that said, there is some instances where this about allegedly just fenced would be normal such as when storage based fencing occurs.
There is similar issue to this where a node is still alive, but the other cluster nodes believe it should have been fenced off that is documented in the following solution: Resources run on two nodes simultaneously, data on shared storage is corrupted, and/or other unexpected behavior occurs in a RHEL 5, 6, or 7 High Availability cluster using fence_ipmilan with method=cycle
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.