fence_kdump times out when cluster node names do not match hostnames

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 6, 7, 8 (with the High Availability Add-on)
  • fence_kdump

Issue

  • fence_kdump fails with a Timer expired message if the output of crm_node -n does not match the output of hostname or hostname -s for a node.
  • fence_kdump can fail when a dedicated heartbeat IP address is used for each cluster node.

Resolution

Red Hat Enterprise Linux 6


There are no plans to fix this issue in RHEL 6.
Red Hat Enterprise Linux 7
  • The issue (bz1760811) has been resolved with errata RHBA-2020:3885 with the following package(s): kexec-tools-2.0.15-51.el7, kexec-tools-anaconda-addon-2.0.15-51.el7, kexec-tools-eppic-2.0.15-51.el7 or later.
    #####Red Hat Enterprise Linux 8
  • The issue (bz1761602) has been resolved with errata RHBA-2020:4462 with the following package(s): kexec-tools-2.0.20-34.el8 or later.

Workaround


Configure `fence_kdump_nodes` as described in the comments of `/etc/kdump.conf`:
# fence_kdump_nodes <node(s)>
#           - List of cluster node(s) except localhost, separated by spaces,
#             to send fence_kdump notifications to.

Root Cause

If fence_kdump_nodes is not configured explicitly in /etc/kdump.conf and a Pacemaker cluster is running when the kdump initrd is created, a dracut script automatically generates a fence_kdump_nodes list. For each node in the output of pcs cluster cib, the dracut script checks whether the node name matches the output of hostname or hostname -s. If so, that node is excluded from the fence_kdump_nodes list.

However, if the local node is known to Pacemaker by a name other than its hostname -- for example, if the hostname is node1 and the node name is node1-hb -- then the local node is not excluded from fence_kdump_nodes. When the local node is included in fence_kdump_nodes and the local node is executing the crash kernel, fence_kdump_send fails to send notifications to all the cluster nodes. fence_kdump can then fail with a Timer expired mesage.

kexec-tools-2.0.15-21.el7:
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh:
628 # retrieves fence_kdump nodes from Pacemaker cluster configuration
629 get_pcs_fence_kdump_nodes() {
630     local nodes
631 
632     # get cluster nodes from cluster cib, get interface and ip address
633     nodelist=`pcs cluster cib | xmllint --xpath "/cib/status/node_state/@uname" -`
634 
635     # nodelist is formed as 'uname="node1" uname="node2" ... uname="nodeX"'
636     # we need to convert each to node1, node2 ... nodeX in each iteration
637     for node in ${nodelist}; do
638         # convert $node from 'uname="nodeX"' to 'nodeX'
639         eval $node
640         nodename=$uname
641         # Skip its own node name
642         if [ "$nodename" = `hostname` -o "$nodename" = `hostname -s` ]; then
643             continue
644         fi
645         nodes="$nodes $nodename"
646     done
647 
648     echo $nodes
649 }
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.