Common KDUMP Troubleshooting

Updated

Introduction - what is kdump?

The kdump is a kernel crash dumping mechanism that allows you to save the contents of the system’s memory for later analysis. In case of a system crash, kdump boots into a second kernel (a capture kernel). This second kernel resides in a reserved part of the system memory that is inaccessible to the first kernel and serves to capture and save the contents of the crashed kernel’s memory (a crash dump, AKA vmcore).

It is highly recommended that you familiarize yourself with the relevant RHEL version's kdump documentation prior to configuring kdump and testing the creation of a vmcore. For your convenience, you can refer to the below links:

RHEL 6 - The kdump Crash Recovery Service

RHEL 7 - Kernel Crash Dump Guide

RHEL 8 - Dumping a Crashed Kernel for later Analysis

RHEL 9 - Installing kdump

RHEL 10 - Installing kdump

DISCLAIMER: When kdump fails to generate a vmcore, there can be several reasons. This article’s purpose is to list some common issues and advise on how to troubleshoot them. The problem, however, can be more complicated than described here.
If you are having trouble with resolving your particular problem, please set up Serial Console output for the kdump kernel as described in Advanced Troubleshooting section below, This content is not included.submit a new Technical Support Case and upload an SOS Report.

Common problem areas

Preliminary note: While the examples described below should help in mitigating common issues, the best approach is to configure the most basic (default) kdump configuration for the relevant system, and verify if a vmcore is generated when triggering a manual panic.
If a vmcore is generated with the basic configuration, start adding more complex parameters until you are no longer able to generate a vmcore.

The kdump service failed to load at boot or at service restart

  • Meaning: The kdump service was not up and running at the moment of the system crash, so it didn't initiate the second Kernel that generates the vmcore.

  • In order to verify if the kdump service is currently running, or if it's not and what might be the reason for it, you can enter the following command:

        RHEL 7 and above:
      # systemctl status kdump -l
    
        RHEL 6:
      # service status kdump
    

There is insufficient available space in the dump target

  • While vmcores are usually compressed when generated and collect only a portion of the memory (see man makedumpfile), the most reliable way to ensure that you have sufficient free space on the dump target device is for it to be at least the same size of the system's total RAM.

Improper dump target configuration in the /etc/kdump.conf file and the system

  • The intended dump target, stated in the "path" parameter, must be an existing directory with write permissions.
  • It is recommended to use persistent device names, in the same manner as the "/etc/fstab" configuration file.
  • The "path" parameter is relative to the target device mount point. Meaning that the resulting absolute dump path is <target mount point>/<path>.
    • For example: If the system has a dedicated disk partition mounted on /var, and the default path is /var/crash, the resulting absolute dump target path would be /var/var/crash.
  • It is recommended to explicitly specify the mount point and the relative path within the mount point, in order to avoid potential nuances.

The memory reservation in the CrashKernel parameter is insufficient

  • By default the kernel has the crashkernel parameter already configured, which at boot calculates the amount of memory that will be reserved for the kdump kernel, based on the total amount of RAM on the system (can slightly vary depending on the major RHEL version and architecture).
  • In RHEL 8 and below, the default crashkernel value is auto, while in RHEL 9 and later this option has been deprecated in favour of a memory range as in the x86_64 example: crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
  • If the Kernel is tainted, for example by third-party modules, or when numerous storage devices are locally mounted on the system (such as MultiPath or Optic Fibre devices), the kdump kernel will require additional memory. If the kdump kernel runs out of memory (meaning that the crashkernel reservation is insufficient), you have two options:
  • Manually increase crashkernel reservation. To achieve that you can follow the relevant documentation:
  • Identify what allocates extra memory in the kdump kernel (for example: unnecessary Storage devices or third-party content) and blacklist it from loading in the kdump kernel, as described in the documentation:
    - RHEL 7 | RHEL 8 | RHEL 9 | RHEL 10
    - RHEL 6 unfortunately doesn't cover this in its documentation, however for more information you can check man kdump.conf and search for "blacklist" keyword.

The dump target is based on unsupported storage

  • Verify that you have chosen your dump target on a storage type that is supported by kdump. You can refer to the following links for each major RHEL version's list of supported dump targets:

Advanced Troubleshooting

If the more common troubleshooting approaches yield no results, Red Hat support engineers would usually request that a Serial Terminal be configured on the system, and that the Serial Console logs be provided after recreating a failed vmcore generation.

  • To configure the Serial Terminal, first review the Knowledge Base Article. Then, configure the relevant parameters as per each major RHEL version.
    - Note: Be aware that additional Serial Terminal / Console configurations are needed on the BIOS or other hardware components of the system. This varies between hardware manufacturers, so consult with the relevant vendor if you are unsure.

  • Afterwards, configure the same parameters for the Serial Terminal in the KDUMP_COMMANDLINE_APPEND parameter, under the /etc/sysconfig/kdump file, as shown in the example below:

         # grep -E "^KDUMP_COMMANDLINE_APPEND" /etc/sysconfig/kdump 
    KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable console=tty0 console=ttyS0,115200"
    
  • Lastly, the kdump's initramfs file needs rebuilding in order to enable the Serial Terminal configuration. To do that, the fastest way would be to alter the timestamp of the "/etc/kdump.conf", followed by restarting the kdump service:

 # touch /etc/kdump.conf

    RHEL 7 and above:
 # systemctl restart kdump.service

    RHEL 6:
 # service restart kdump
  • Verify that the initramfs file has a fresh time stamp:
    RHEL 7 and above:
 # ls -l /boot/initramfs-`uname -r`kdump.img

    RHEL 6:
 # ls -l /boot/initrd-`uname -r`kdump.img
SBR
Category
Components
Tags
Article Type