How do I configure fence_kdump in a Red Hat Pacemaker cluster?
Environment
- Red Hat Enterprise Linux 6, 7, 8 or 9 (with the High Availability Add-on)
- Pacemaker
Issue
- How do I configure a
fence_kdumpstonith device in a Red Hat Pacemaker cluster?
Resolution
-
Ensure that the
kdumpservice is properly configured on all nodes and is started. The This content is not included.Kdump Helper web application can assist in configuringkdump.RHEL 6
# service kdump status Kdump is operationalRHEL 7, 8 or 9
# systemctl is-active kdump active -
Ensure that the
fence_kdumpfence agent is installed on all cluster nodes.RHEL 6
# yum install fence-agentsRHEL 7, 8 or 9
# yum install fence-agents-kdump -
Create a
fence_kdumpSTONITH device in the cluster. The below command should be run from any one cluster node only.# pcs stonith create kdump fence_kdump pcmk_reboot_action="off" pcmk_host_list="node-1 node-2"Note: In some older versions, additional parameters may be needed if the STONITH device fails to start. For more information, see solution A fence_kdump STONITH device fails to start and fencing fails in RHEL 6 or 7 pacemaker clusters.
-
Configure STONITH levels so that
fence_kdumpis the primary fencing device and the existing stonith device becomes secondary. More information on STONITH levels can be found in solution How to configure/manage STONITH 'levels' in RHEL cluster with pacemaker?# pcs stonith level add 1 node-1 kdump # pcs stonith level add 1 node-2 kdump # pcs stonith level add 2 node-1 fence-node-1 # pcs stonith level add 2 node-2 fence-node-2Below is an example of how the resulting configuration will look. Note that
fence_xvmis used as example; your other stonith device(s) may vary.# pcs config ... Stonith Devices: Resource: fence-node-1 (class=stonith type=fence_xvm) Operations: monitor interval=30s (fence-node-1-monitor-interval-30s) Resource: fence-node-2 (class=stonith type=fence_xvm) Attributes: delay=10 Operations: monitor interval=30s (fence-node-2-monitor-interval-30s) Resource: kdump (class=stonith type=fence_kdump) Attributes: pcmk_reboot_action=off pcmk_host_list="node-1 node-2" Operations: monitor interval=60s (kdump-monitor-interval-60s) Fencing Levels: Node: node-1 Level 1 - kdump Level 2 - fence-node-1 Node: node-2 Level 1 - kdump Level 2 - fence-node-2 ... -
On all nodes (which includes remote nodes) allow port
7410/udpthrough the firewall. The7410/udpneeds to be open on all nodes (and remote nodes) so that the node can receive incoming notification from another node that is booted intokdumpkernel (in order to collect a vmcore). Once the surviving node (which is runningfence_kdump) receives the message from the node that is booted into kdump kernel then it will send its own reply to the node that is booted into the kdump kernel.RHEL 6
# iptables -I INPUT -p udp --dport 7410 -j ACCEPT # service iptables save; service iptables restartRHEL 7, 8 or 9
# firewall-cmd --add-port=7410/udp # firewall-cmd --add-port=7410/udp --permanent -
Rebuild an
initramfsimage file forkdumpon all cluster nodes.RHEL 6
# touch /etc/kdump.conf # service kdump restartRHEL 7, 8 or 9
# touch /etc/kdump.conf # systemctl restart kdump- Note: The
touch /etc/kdump.confcommand is used to ensure the file exists before restarting the service, assystemctl restart kdumpwill rebuild theinitramfsimage if it detects that the configuration file has been modified or touched.
- Note: The
-
Ensure that the
initramfsimage file containsfence_kdump_sendandhostsfiles. For Red Hat Enterprise Linux 7, please note that a known issue where thehostsfile is not included into theinitramfsfile is reported.# lsinitrd /boot/initramfs-$(uname -r)kdump.img | egrep "fence|hosts" -
Test the configuration by crashing one of the nodes using the command below.
# echo c > /proc/sysrq-triggerOn the crashed node, you may see the
kdumpprogress on a graphical or serial console. On the cluster node where the fencing was triggered, you should see in the logs thatfence_kdumpis waiting for a message from that node.fencing node-1 fence_kdump[XXXX]: waiting for message from '1.1.1.1'Once the dump on the crashed node is started, it will send messages every 10 seconds. The fencing node should then report that the message is received and confirm the fencing as successful.
fence_kdump[XXXX]: received valid message from '1.1.1.1' fence node-1 success
Note: Usingfence_kdump can increase failover time (up to configured timeouts) if the failing node is unable to generate a kernel dump or the kernel dump process is unable to communicate with the surviving cluster nodes.
Note: If you encounter a timeout while waiting to receive the message in RHEL 6 or 7 and your kexec-tools package is older than version 2.0.14-17.el7, see the solution fence_kdump fails with "timeout after X seconds" in a RHEL 6 or 7 High Availability cluster for further information.
NOTE: If the cluster node is AWS using XEN hypervisor then kdump is not supported: XEN/AWS kdump support for RHEL.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.