A fence_kdump stonith device fails to start and fencing fails in RHEL 6 or 7 pacemaker clusters
Environment
- Red Hat Enterprise Linux (RHEL) 6 or 7 with the High Availability Add On
pacemakerfence-agents-kdumpreleases prior to4.0.11-10.el7in RHEL 7fence-agentsreleases prior to4.0.15-8.el6fence_kdumpin use as a stonith device fence agentkdump/kexecconfigured and running on cluster nodes
Issue
- How can I use
fence_kdumpwith pacemaker on RHEL 6? - I have created a stonith device using
fence_kdump, butpcsshows it as Stopped and shows a failure in trying to start it:
# pcs status
[...]
Online: [ rhel6-node1-pcmk.example.com rhel6-node2-pcmk.example.com ]
Full list of resources:
fs (ocf::heartbeat:Filesystem): Started rhel6-node2-pcmk.example.com
kdump (stonith:fence_kdump): Stopped
Failed actions:
kdump_start_0 on rhel6-node1-pcmk.example.com 'unknown error' (1): call=31, status=Error, last-rc-change='Mon May 5 16:46:30 2014', queued=1011ms, exec=0ms
- My
fence_kdumpstonith device is reporting errors in/var/log/messages:
May 5 16:46:31 rhel6-node1-pcmk stonith-ng[2136]: notice: log_operation: Operation 'monitor' [2278] for device 'kdump' returned: -201 (Generic Pacemaker error)
May 5 16:46:31 rhel6-node1-pcmk stonith-ng[2136]: warning: log_operation: kdump:2278 [ [error]: unsupported action 'monitor' ]
Resolution
RHEL 7
- Update to
fence-agents-kdump-4.0.11-10.el7or later.
RHEL 6
- Update to
fence-agents-4.0.15-8.el6or later.
All Releases
-
Workaround: When creating the
fence_kdumpstonith device, specify the following attributes/options:-
pcmk_monitor_action="metadata":fence_kdumpdoes not offer a "monitor" action, but it can be "tricked" into using one of its other actions that should always return success -
pcmk_status_action="metadata":fence_kdumpdoes not offer a "status" action, so it must be "tricked" into using one of its other actions that should always return success -
pcmk_reboot_action="off":fence_kdumpsupports action="off" but not "reboot", so it must be told to use the former. -
--force:pcsmay not recognize the first three attributes above in some releases, so they may need to be forced
For example:
# pcs stonith create kdump fence_kdump pcmk_status_action="metadata" pcmk_monitor_action="metadata" pcmk_reboot_action="off" pcmk_host_list="rhel6-node1-pcmk.example.com rhel6-node2-pcmk.exmaple.com" --force -
Root Cause
This issue was addressed by Red Hat Engineering in Bugzilla #1094520 in RHEL 7 and #1094515 for RHEL 6.
The fence_kdump agent does not support the "monitor" action, which stonith-ng calls when trying to start a device. The result is that the device does not start anywhere in the cluster, and thus stonith-ng does not consider any node capable of fencing a missing node with fence_kdump.
Similarly, fence_kdump does not offer a "status" action, so even if we can get past "monitor", the reboot action fails.
In both cases, we can tell stonith-ng to use the "metadata" action instead. This could be considered a less than ideal workaround, but given the fact that fence_kdump doesn't correspond to a physical device the same way other agents do, the "monitor" and "status" actions don't fit in logically with this agent, and so causing them to simply return success makes some sense. It would still be ideal for these actions to actually be implemented properly, but Red Hat will support the use of the identified workaround.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.