Kdump fails to create dumpfile via SSH or NFS on OVN based clusters

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP) 4

Issue

  • Unable to capture vmcore over NFS or SSH on Red Hat CoreOS, seeing below error on console:
Bad kdump network destination: xxx.xxx.xxx.xxx
or
Oops. The network still isn't ready after waiting 10mins.
get_host_ip exited with non-zero status!

Resolution

  • The ovs-configuration service file has been updated to start after kdump.
    Snippet of ovs-configuration.service file:
[Unit]
# Kdump will generate it's initramfs based on the running state when kdump.service run
# If OVS has already run, the kdump fails to gather a working network config,
# which prevent network log exports, sush as SSH.
# See https://issues.redhat.com/browse/OCPBUGS-28239
After=kdump.service

Follow below steps to apply the solution:

Important Note: For remote dump targets: below steps help in generating kdump initramfs before starting of ovs-configuration.service and this solves the problem.

  • Step 1: Make sure that /etc/kdump.conf file is edited. If the file was already modified, update the file modification time stamp using touch command like below:
# touch /etc/kdump.conf
  • Step 2: Make sure that After=kdump.service is present under file /etc/systemd/system/ovs-configuration.service as mentioned above.
  • Step 3: After confirming presence of After=kdump.service under file /etc/systemd/system/ovs-configuration.service, reboot the system using #reboot command.
    (Note: Rebooting will make sure that kdump initramfs is rebuilt before initiating ovs-configuration and it will help in avoiding the issue).
  • Step 4: Post reboot, test working of kdump by manually crashing system:
# echo c > /proc/sysrq-trigger

(Note: Above command will crash the system immediately. Downtime is required.)

Root Cause

  • Network inside kdump initramfs in not properly set due to a change of behavior in RHEL9 kexec-tools, where system connections files are copied.
$ git show 63c3805c486adf700bafb5ad78cc9b0f55fcb345 
commit 63c3805c486adf700bafb5ad78cc9b0f55fcb345
Author: Coiby Xu <coxu@redhat.com>
Date:   Fri Sep 17 13:02:07 2021 +0800

    Set up kdump network by directly copying NM connection profile to initrd
  • As Red Hat CoreOS nodes enabled with OVN setup have ovs connection, this would not work inside kdump initramfs, the nfs mount will fail or ssh connection would not establish to dump the core. To resolve the issue kdump would be started before ovs service. Content from github.com is not included.PR
  • The long term fix has been created to support ovs setup inside kdump initramfs. The current fix is released in RHEL 9.5 via kexec-tools-2.0.27-16.el9_5.1.x86_64

Diagnostic Steps

  • Make sure br-ex and ovs system connection files are not in initramfs
# lsinitrd /var/lib/kdump/initramfs-$(uname -r)kdump.img  | grep -i network
  • If the below files are present inside the kdump initramfs this would mean kdump was started after ovs-configuration service.
-rw-------. 1 root root 273 Jan 26 01:43 br-ex.nmconnection
-rw-------. 1 root root 717 Jan 26 01:43 ovs-if-br-ex.nmconnection
-rw-------. 1 root root 199 Jan 26 01:43 ovs-port-br-ex.nmconnection
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.