How to trigger SysRq/NMI through a DELL DRAC in order to troubleshoot a hung system?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9

Issue

  • Interacting with a Dell system through SysRq/NMI signals sent via DRAC is required to perform several troubleshooting tasks on a hung system.

Resolution

Trigger an NMI from the iDRAC remote console

Trigger an NMI from another RHEL system using serial over LAN

  • Enable the Serial Over LAN from iDRAC if not already enabled.

  • Install the ipmitool package on a separate RHEL system that has access to the DRAC remote management system's ip address.

# yum -y install ipmitool
  • On that separate RHEL system, execute the following ipmitool command to send the NMI signal to the affected server via serial over LAN interface. This will immediately panic the system if the NMI tuneables are set:
# ipmitool -H test.example.com -I lanplus -U <user> -P <password> chassis power diag
  • Replace test.example.com with iDRAC domain/IP.

Trigger an SysRq from another RHEL system using serial over LAN

  • Enable the Serial Over LAN from iDRAC if not already enabled.

  • Install the ipmitool package on a separate RHEL system as shown in the above section.

  • Using this separate RHEL system as the starting point, execute the following ipmitool command to access the management system of the affected server via serial over LAN interface
# ipmitool -H test.example.com -I lanplus -U <user> -P <password> sol activate
  • Press the following keys in quick succession
[shift]+[~] then [shift]+[b]
  • The following output should be visible:
~B [send break]
  • Enter 'm' key to force a memory dump.
SysRq : Show Memory
Node 0 DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
cpu 2 hot: high 0, batch 1 used:0
cpu 2 cold: high 0, batch 1 used:0
cpu 3 hot: high 0, batch 1 used:0
cpu 3 cold: high 0, batch 1 used:0
cpu 4 hot: high 0, batch 1 used:0
cpu 4 cold: high 0, batch 1 used:0
...
  • This should confirm that ipmitool is successfully triggering sysrq signals over the DRAC remote management system of the affected server. In order to crash the server, first repeat the keystrokes that allowed sending a remote sysrq signal:
[shift]+[~] then [shift]+[b]
  • [CAUTION: The following cause a kernel panic on the affected system - Which should be the intention] Press 'c' to send a "force crash" sysrq signal

Pre-requisites


DELL DRAC has to be configured correctly
[How do I set up a serial terminal and/or console in Red Hat Enterprise Linux?](https://access.redhat.com/site/node/7212)
[How do I troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux?](https://access.redhat.com/site/solutions/6038)
[How can I use the SysRq facility to collect information from a server which has hung?](https://access.redhat.com/site/solutions/2023)
[How can I configure my system to crash when NMI switch is pushed?](https://access.redhat.com/solutions/125103)
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.