How to panic a hung RHEL Guest on a Hyper V host using an NMI
Environment
- Red Hat Enterprise Linux
- Hyper V host environment (must be Hyper V 2012 R2 or later)
Issue
- Red Hat Enterprise Linux guest is unresponsive
- When system is hung, you are unable to panic the system using the
sysrqtrigger
Resolution
- In order to collect a vmcore from the hung system, we will send an Non-Maskable Interrupt (NMI) from the Hyper V host to the hung guest using the Debug-VM cmdlet in Windows Powershell
(1) Ensure that kdump is configured properly and is able to collect a core by testing the system manually:
General information on configuring and testing kdump is available here
Specific details for running kdump on a HyperV system are available here
(2) Make sure that the Red Hat guest is able to service an NMI sent from the Hyper-V host:
# cat /proc/sys/kernel/panic_on_io_nmi
0
# cat /proc/sys/kernel/unknown_nmi_panic
0
These values need to be equal to 1 to panic a hung guest using the NMI from Hyper V
To set this change at runtime (this will not be persistent between reboots)
# echo 1 > /proc/sys/kernel/panic_on_io_nmi
# cat /proc/sys/kernel/panic_on_io_nmi
1
# echo 1 > / proc/sys/kernel/unknown_nmi_panic
# cat / proc/sys/kernel/unknown_nmi_panic
1
To set this change persistently, edit and reload /etc/sysctl.conf:
# echo "kernel.panic_on_io_nmi=1" >> /etc/sysctl.conf
# echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
# tail -n 2 /etc/sysctl.conf
kernel.panic_on_io_nmi=1
kernel.unknown_nmi_panic=1
# sysctl -p
# cat /proc/sys/kernel/panic_on_io_nmi
1
# cat / proc/sys/kernel/unknown_nmi_panic
1
(3) Ensure that kdump is running:
In RHEL 6 and below:
# service kdump status
Kdump is operational
In RHEL 7 and above:
# systemctl status kdump.service
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
Active: active (exited) since Thu 2017-02-23 17:00:23 EST; 3s ago
Process: 14870 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 14870 (code=exited, status=0/SUCCESS)
Feb 23 17:00:22 r72.example.com systemd[1]: Starting Crash recovery kernel arming...
Feb 23 17:00:23 r72.example.com kdumpctl[14870]: kexec: loaded kdump kernel
Feb 23 17:00:23 r72.example.com kdumpctl[14870]: Starting kdump: [OK]
Feb 23 17:00:23 r72.example.com systemd[1]: Started Crash recovery kernel arming.
(4) You can now service NMIs from the Hyper-V host sent to the RHEL guest using the following command from your Hyper-V host.
Note: "Windows Powershell" should be launched using "Run as administrator" option.
PS C:\> debug-vm "VM to Debug" -InjectNonMaskableInterrupt -Force
Please see Content from technet.microsoft.com is not included.Microsoft Technet documentation on the Debug-VM command
Note the information provided in the above link is not provided by Red Hat. While this method has been tested by Red Hat, we cannot guarantee its success in all instances and cannot assist in troubleshooting the command
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.