fence_kdump times out when fence_kdump_nodes is not specified with kexec-tools version 2.0.15 or later
Environment
- Red Hat Enterprise Linux 7 or 8 (with the High Availability Add-on)
kexec-tools-2.0.15-13.el7or laterfence_kdump
Issue
- I need to capture a vmcore from a cluster node, but
fence_kdumptimes out every time that node crashes. fence_kdumpfails with "timeout after 60 seconds" and the node gets fenced before the core is dumped.- If I test
fence_kdumpby panicking a node,fence_kdumpfails with a time out error. If I take the node out of the cluster and panic it, it dumps a core successfully.
Resolution
Red Hat Enterprise Linux 7
Upgrade to [`kexec-tools-2.0.15-43.el7`](/errata/RHBA-2020:1077) or later.
Red Hat Enterprise Linux 8
This issue is being tracked in private bug RHBZ#1761339. As of 14 October 2019, this bug is in NEW state.
If you would like to track the progress of this bug, please This content is not included.open a case with Red Hat Global Support Services.
Workaround
Configure `fence_kdump_nodes` as described in the comments of `/etc/kdump.conf`:
# fence_kdump_nodes <node(s)>
# - List of cluster node(s) except localhost, separated by spaces,
# to send fence_kdump notifications to.
Related issues
- Solution 2388711 - fence_kdump fails with "timeout after X seconds" in a RHEL 6 or 7 High Availability cluster with kexec-tools versions older than 2.0.14
- Solution 4499751 - fence_kdump times out when cluster node names do not match hostnames
Root Cause
The network module is supposed to be pulled into the kdump initrd as a dependency if fence_kdump_nodes is specified in /etc/kdump.conf or if there is a fence_kdump device in the cluster. However, a change was introduced in kexec-tools-2.0.15 that breaks the addition of the network module to the dependencies list.
/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh:
BEFORE:
depends() {
...
if [ is_generic_fence_kdump -o is_pcs_fence_kdump ]; then
_dep="$_dep network"
fi
...
}
AFTER:
depends() {
...
if is_generic_fence_kdump -o is_pcs_fence_kdump; then
_dep="$_dep network"
fi
...
}
The if condition in version 2.0.15 without square brackets no longer evaluates to true.
Diagnostic Steps
-
Verify that version
2.0.15or later of thekexec-toolspackage is installed. -
Observe that
fence_kdump_nodesis not configured explicitly in/etc/kdump.conf. -
Find that the
networkmodule is not included in the kdump initrd.# lsinitrd /boot/initramfs-$(uname -r)kdump.img Image: /boot/initramfs-3.10.0-1058.el7.x86_64kdump.img: 19M ======================================================================== Early CPIO image ======================================================================== drwxr-xr-x 3 root root 0 Oct 11 13:27 . -rw-r--r-- 1 root root 2 Oct 11 13:27 early_cpio drwxr-xr-x 3 root root 0 Oct 11 13:27 kernel drwxr-xr-x 3 root root 0 Oct 11 13:27 kernel/x86 drwxr-xr-x 2 root root 0 Oct 11 13:27 kernel/x86/microcode -rw-r--r-- 1 root root 100352 Oct 11 13:27 kernel/x86/microcode/GenuineIntel.bin ======================================================================== Version: dracut-033-564.el7 Arguments: --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' -o 'plymouth dash resume ifcfg' --mount '/dev/mapper/r7vg-root_lv /sysroot xfs defaults' --no-hostonly-default-device -f dracut modules: bash nss-softokn i18n dm kernel-modules lvm qemu qemu-net fstab-sys rootfs-block terminfo udev-rules biosdevname systemd usrmount base fs-lib kdumpbase microcode_ctl-fw_dir_override shutdown ...
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.