How to configure kdump to capture a vmcore for Microsoft Azure virtual machines

Solution Verified - Updated

Environment

  • Microsoft Azure
  • General kdump configuration
  • Red Hat Enterprise Linux 8 and later.
  • OR Any RHEL version above RHEL 6 on Azure
  • Red Hat Enterprise Linux 7 (On-Demand)
    • Size - Standard_DS2_v2 VM
    • kexec-tools version 2.0.7-50.el7
  • Red Hat Enterprise Linux 6 (On-Demand)
    • Size - Standard_DS2_v2 VM
    • kexec-tools version 2.0.7-50.el7

Issue

  • Customers need to complete a root cause analysis (RCA) for a kernel panic.
  • Customers need to know how to configure kdump and capture the vmcore.

Resolution

Install and configure kdump

  1. Verify that the kexec-tools package is installed.

    # rpm -q kexec-tools
    

    If it is not installed, switch to root user and install the package.

       # sudo su -
    
       # yum install kexec-tools
    
  2. Verify that the default crash location is in the kdump configuration file and that /var/crash is available.

        # grep -v "#" /etc/kdump.conf
    
        path /var/crash
        core_collector makedumpfile -l --message-level 1 -d 31
    

Note:

Add boot parameters

(General kdump configuration) Red Hat Enterprise Linux 8 and later. (OR Any RHEL version above RHEL 6)

  • Set crashkernel to auto if not configured already.(Reboot is required for making changes on kernel command line).
# grubby --args="crashkernel=auto" --update-kernel ALL 
# systemctl enable kdump
# reboot
  • Verify that kdump is running post reboot:
# systemctl status kdump

RHEL 7

  • A crashkernel size of 128 MB is not sufficient for kdump to capture the vmcore for Red Hat Enterprise Linux (RHEL) 7 On-Demand virtual machines (VMs) with 7 GB memory (or more).

  • The crashkernel size must be set at 256 MB (crashkernel=256M) for kdump to work with RHEL 7 VMs with 7 GB memory (or more).

  • The following steps describe how to add crashkernel=256M to the boot parameter line in the grub configuration file and how to configure the kdump service.

  1. Add crashkernel=256M to the boot parameter line in the grub configuration file.

        # vi /etc/default/grub
    
        GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 crashkernel=256M"
    
  2. Regenerate the grub configuration file.

    # grub2-mkconfig -o /boot/grub2/grub.cfg
    
  3. Enable the kdump service to start at system boot.

    # systemctl enable kdump
    
  4. Reboot the VM to allocate separate kernel crash memory to the VM.

    # reboot
    

    The following example shows a partial /boot/grub/grub.conf file.

    ### BEGIN /etc/grub.d/10_linux ###
    menuentry 'Red Hat Enterprise Linux Server (3.10.0-514.21.2.el7.x86_64) 7.3 (Maipo)' --class red --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-514.21.1.el7.x86_64-advanced-70c0618f-46aa-412a-b7ef-8c0576628d9c' {
    load_video
    set gfxpayload=keep
    insmod gzio
    insmod part_msdos
    insmod xfs
    set root='hd0,msdos1'
    if [ x$feature_platform_search_hint = xy ]; then
      search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  b431e990-ecc7-41cd-9014-f96853c33871
    else
      search --no-floppy --fs-uuid --set=root b431e990-ecc7-41cd-9014-f96853c33871
    fi
    linux16 /vmlinuz-3.10.0-514.21.2.el7.x86_64 root=UUID=70c0618f-46aa-412a-b7ef-8c0576628d9c ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 crashkernel=256M LANG=en_US.UTF-8
    initrd16 /initramfs-3.10.0-514.21.2.el7.x86_64.img
    }
    
  5. Verify that kdump is active and running.

    # systemctl status kdump
    kdump.service - Crash recovery kernel arming
    Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
    Active: active (exited) since Thu 2017-07-06 05:57:27 UTC; 24min ago
    Process: 848 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
    Main PID: 848 (code=exited, status=0/SUCCESS)
    CGroup: /system.slice/kdump.service
    

RHEL 6

  • The minimum crashkernel size required for RHEL 6 VMs is 128 MB.

  • The crashkernel size must be set at 128 MB (or 256 MB) for kdump to work with RHEL 6 VMs with 7 GB (or more) memory.

  • The following steps describe how to add crashkernel=128M to the boot parameter line in the grub configuration file and how to configure the kdump service.

  1. Add crashkernel=128M to the boot parameter line in the grub configuration file.

        # vi /boot/grub/grub.conf
    
        kernel /vmlinuz-2.6.32-696.1.1.el6.x86_64 ro root=UUID=38dc5d60-b9ea-41a0-a13a-5a6f9f90d2bd rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD console=ttyS0 earlyprintk=ttyS0 rootdelay=300 numa=off SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM crashkernel=128M
    

The following example shows a partial /boot/grub/grub.conf file.

```
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/vda2
#          initrd /initrd-[generic-]version.img
#boot=/dev/vda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.32-696.1.1.el6.x86_64)
    root (hd0,0)
    kernel /vmlinuz-2.6.32-696.1.1.el6.x86_64 ro root=UUID=38dc5d60-b9ea-41a0-a13a-5a6f9f90d2bd rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD console=ttyS0 earlyprintk=ttyS0 rootdelay=300 numa=off SYSFONT=latarcyrheb-sun16 rd_NO_LVM rd_NO_DM crashkernel=128M
    initrd /initramfs-2.6.32-696.1.1.el6.x86_64.img
```
  1. Enable the kdump service to start at system boot.

    # chkconfig kdump on
    # chkconfig --list kdump
    kdump              0:off    1:off    2:on    3:on    4:on    5:on    6:off
    
  2. Reboot the VM to allocate separate kernel crash memory to the VM.

    # reboot
    
  3. Verify that kdump is active and running.

    # service kdump status
    Kdump is operational
    

Testing kdump

  1. Trigger sysrq and verify that a vmcore dump is captured. This will induce kernel panic, and the SSH session will hang or disconnect. Rebooting is not necessary, but you need to establish a new SSH session with the VM.

    WARNING: This action will crash the VM, so perform the task at a planned time if this a production system.

    # echo c > /proc/sysrq-trigger
    
  2. Start a new SSH session and verify that a vmcore file is captured.

        # tree /var/crash/
        /var/crash/
        └── 127.0.0.1-2017-06-14-08:13:33
                ├── vmcore
                └── vmcore-dmesg.txt
    
        1 directory, 2 files
    

Diagnostic Steps

RHEL 7

  • kdump service restart fails with the error below.

  • Journal shows "No memory reserved for crash kernel."

        # systemctl status kdump
        # systemctl restart kdump
        Job for kdump.service failed because the control process exited with error code.
        See "systemctl status kdump.service" and "journalctl -xe" for details.
    
        # journalctl -xe
        -- Unit kdump.service has begun starting up.
        Jun 29 18:04:31 ... kdumpctl[18284]: No memory reserved for crash kernel.
        Jun 29 18:04:31 ... kdumpctl[18284]: Starting kdump: [FAILED]
        Jun 29 18:04:31 ... systemd[1]: kdump.service: main process exited, code=exited, status=1
        Jun 29 18:04:31 ... systemd[1]: Failed to start Crash recovery kernel arming.
        -- Subject: Unit kdump.service has failed
    
  • After adding the crashkernel boot parameter and rebooting the system, it allocates separate crashkernel memory space.

  • kdump restarts successfully.

        # systemctl status kdump
        kdump.service - Crash recovery kernel arming
        Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
        Active: active (exited) since Fri 2017-06-30 01:26:11 UTC; 1h 38min ago
        Process: 854 ExecStart=/usr/bin/kdumpctl start (code=exited,status=0/SUCCESS)
        Main PID: 854 (code=exited, status=0/SUCCESS)
        CGroup: /system.slice/kdump.service
    
        Jun 30 01:26:06 ... systemd[1]: Starting Crash recovery kernel arming...
        Jun 30 01:26:11 ... kdumpctl[854]: kexec: loaded kdump kernel
        Jun 30 01:26:11 ... kdumpctl[854]: Starting kdump: [OK]
        Jun 30 01:26:11 ... systemd[1]: Started Crash recovery kernel arming.
        
        # systemctl restart kdump
    

RHEL 6

  • kdump service restart fails after configuration

    service kdump restart
    Memory for crashkernel is not reserved
    Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
    Stopping kdump:                                            [FAILED]
    Starting kdump:                                            [FAILED]
    
  • After adding the crashkernel boot parameter and rebooting the system, it allocates separate crashkernel memory space.

  • kdump restarts successfully.

    # reboot
    # service kdump restart
    Stopping kdump:                                            [  OK  ]
    Starting kdump:                                            [  OK  ]
    
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.