fence_kdump times out when fence_kdump_nodes is not specified with kexec-tools version 2.0.15 or later

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 7 or 8 (with the High Availability Add-on)
  • kexec-tools-2.0.15-13.el7 or later
  • fence_kdump

Issue

  • I need to capture a vmcore from a cluster node, but fence_kdump times out every time that node crashes.
  • fence_kdumpfails with "timeout after 60 seconds" and the node gets fenced before the core is dumped.
  • If I test fence_kdump by panicking a node, fence_kdump fails with a time out error. If I take the node out of the cluster and panic it, it dumps a core successfully.

Resolution

Red Hat Enterprise Linux 7


Upgrade to [`kexec-tools-2.0.15-43.el7`](/errata/RHBA-2020:1077) or later.

Red Hat Enterprise Linux 8


This issue is being tracked in private bug RHBZ#1761339. As of 14 October 2019, this bug is in NEW state.

If you would like to track the progress of this bug, please This content is not included.open a case with Red Hat Global Support Services.

Workaround


Configure `fence_kdump_nodes` as described in the comments of `/etc/kdump.conf`:
# fence_kdump_nodes <node(s)>
#           - List of cluster node(s) except localhost, separated by spaces,
#             to send fence_kdump notifications to.

Root Cause

The network module is supposed to be pulled into the kdump initrd as a dependency if fence_kdump_nodes is specified in /etc/kdump.conf or if there is a fence_kdump device in the cluster. However, a change was introduced in kexec-tools-2.0.15 that breaks the addition of the network module to the dependencies list.

/usr/lib/dracut/modules.d/99kdumpbase/module-setup.sh:
BEFORE:
depends() {
...
    if [ is_generic_fence_kdump -o is_pcs_fence_kdump ]; then
        _dep="$_dep network"
    fi
...
}

AFTER:
depends() {
...
    if is_generic_fence_kdump -o is_pcs_fence_kdump; then
        _dep="$_dep network"
    fi
...
}

The if condition in version 2.0.15 without square brackets no longer evaluates to true.

Diagnostic Steps

  1. Verify that version 2.0.15 or later of the kexec-tools package is installed.

  2. Observe that fence_kdump_nodes is not configured explicitly in /etc/kdump.conf.

  3. Find that the network module is not included in the kdump initrd.

     # lsinitrd /boot/initramfs-$(uname -r)kdump.img
     Image: /boot/initramfs-3.10.0-1058.el7.x86_64kdump.img: 19M
     ========================================================================
     Early CPIO image
     ========================================================================
     drwxr-xr-x   3 root     root            0 Oct 11 13:27 .
     -rw-r--r--   1 root     root            2 Oct 11 13:27 early_cpio
     drwxr-xr-x   3 root     root            0 Oct 11 13:27 kernel
     drwxr-xr-x   3 root     root            0 Oct 11 13:27 kernel/x86
     drwxr-xr-x   2 root     root            0 Oct 11 13:27 kernel/x86/microcode
     -rw-r--r--   1 root     root       100352 Oct 11 13:27 kernel/x86/microcode/GenuineIntel.bin
     ========================================================================
     Version: dracut-033-564.el7
     
     Arguments: --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' -o 'plymouth dash resume ifcfg' --mount '/dev/mapper/r7vg-root_lv /sysroot xfs defaults' --no-hostonly-default-device -f
     
     dracut modules:
     bash
     nss-softokn
     i18n
     dm
     kernel-modules
     lvm
     qemu
     qemu-net
     fstab-sys
     rootfs-block
     terminfo
     udev-rules
     biosdevname
     systemd
     usrmount
     base
     fs-lib
     kdumpbase
     microcode_ctl-fw_dir_override
     shutdown
     ...
    
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.