Kdump fails to create dumpfile via SSH or NFS on OVN based clusters
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
Issue
- Unable to capture vmcore over NFS or SSH on Red Hat CoreOS, seeing below error on console:
Bad kdump network destination: xxx.xxx.xxx.xxx
or
Oops. The network still isn't ready after waiting 10mins.
get_host_ip exited with non-zero status!
Resolution
- The
ovs-configurationservice file has been updated to start after kdump.
Snippet ofovs-configuration.servicefile:
[Unit]
# Kdump will generate it's initramfs based on the running state when kdump.service run
# If OVS has already run, the kdump fails to gather a working network config,
# which prevent network log exports, sush as SSH.
# See https://issues.redhat.com/browse/OCPBUGS-28239
After=kdump.service
- The fixes have been released in below errata:
RHOCP 4.14.22 : RHSA-2024:1891
RHOCP 4.15.9 : RHSA-2024:1770
RHOCP 4.16.0 : RHSA-2024:0041
Follow below steps to apply the solution:
Important Note: For remote dump targets: below steps help in generating kdump initramfs before starting of ovs-configuration.service and this solves the problem.
- Step 1: Make sure that
/etc/kdump.conffile is edited. If the file was already modified, update the file modification time stamp using touch command like below:
# touch /etc/kdump.conf
- Step 2: Make sure that
After=kdump.serviceis present under file/etc/systemd/system/ovs-configuration.serviceas mentioned above. - Step 3: After confirming presence of
After=kdump.serviceunder file/etc/systemd/system/ovs-configuration.service, reboot the system using#rebootcommand.
(Note: Rebooting will make sure that kdump initramfs is rebuilt before initiatingovs-configurationand it will help in avoiding the issue). - Step 4: Post reboot, test working of kdump by manually crashing system:
# echo c > /proc/sysrq-trigger
(Note: Above command will crash the system immediately. Downtime is required.)
Root Cause
- Network inside kdump initramfs in not properly set due to a change of behavior in RHEL9 kexec-tools, where system connections files are copied.
$ git show 63c3805c486adf700bafb5ad78cc9b0f55fcb345
commit 63c3805c486adf700bafb5ad78cc9b0f55fcb345
Author: Coiby Xu <coxu@redhat.com>
Date: Fri Sep 17 13:02:07 2021 +0800
Set up kdump network by directly copying NM connection profile to initrd
- As Red Hat CoreOS nodes enabled with OVN setup have ovs connection, this would not work inside kdump initramfs, the nfs mount will fail or ssh connection would not establish to dump the core. To resolve the issue kdump would be started before ovs service. Content from github.com is not included.PR
- The long term fix has been created to support ovs setup inside kdump initramfs. The current fix is released in
RHEL 9.5viakexec-tools-2.0.27-16.el9_5.1.x86_64
Diagnostic Steps
- Make sure br-ex and ovs system connection files are not in initramfs
# lsinitrd /var/lib/kdump/initramfs-$(uname -r)kdump.img | grep -i network
- If the below files are present inside the kdump initramfs this would mean kdump was started after ovs-configuration service.
-rw-------. 1 root root 273 Jan 26 01:43 br-ex.nmconnection
-rw-------. 1 root root 717 Jan 26 01:43 ovs-if-br-ex.nmconnection
-rw-------. 1 root root 199 Jan 26 01:43 ovs-port-br-ex.nmconnection
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.