What is an NMI and what can I use it for?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 3 Update 3 and later
  • Red Hat Enterprise Linux 4, 5, 6, 7, 8, 9

Issue

  • Often, the term NMI is used when dealing with system hangs and crashes, what does this term mean and how can we configure the kernel to make use of them?
  • What is an NMI?

Resolution

  • NMI is an acronym for Non Maskable Interrupt. In order to more fully understand the use and value of an NMI we must first define the term interrupt in this context. The following is a brief description of the use of interrupts by the kernel as provided by the Red Hat Enterprise Linux Performance Tuning Guide (Red Hat Enterprise Linux 6 - Performance Tuning Guide - 4.3. Interrupts and IRQ Tuning):

    An interrupt request (IRQ) is a request for service, sent at the hardware level. Interrupts can be sent by either a dedicated hardware line, or across a hardware bus as an information packet (a Message Signaled Interrupt, or MSI).

    When interrupts are enabled, receipt of an IRQ prompts a switch to interrupt context. Kernel interrupt dispatch code retrieves the IRQ number and its associated list of registered Interrupt Service Routines (ISRs), and calls each ISR in turn. The ISR acknowledges the interrupt and ignores redundant interrupts from the same IRQ, then queues a deferred handler to finish processing the interrupt and stop the ISR from ignoring future interrupts.

Software Interrupts
  • A software interrupt will be generated by an application in response to a situation that is specific to that code. Software may choose to respond to and deal with these interrupts or in some exceptional cases, simply disregard them completely.
Hardware Interrupts
  • A Hardware interrupt is a signal from a device that needs attention. This device could be almost anything attached to the computer. From the insertion or removal of a USB device, the data from an input device such as a keyboard or mouse, or a message from a disk device. When either of these interrupts are received, the kernel will correctly route the interrupt to the running software, whether device driver or software application, what wishes to process that message. The software that responds to the interrupt is scheduled and may not run immediately, but when the scheduler next puts that software on to a CPU's run queue.
Non Maskable Interrupt
  • This execution path is quite different for an NMI, which may not be ignored. The kernel will receive the interrupt and process it immediately. There are a number of sources of an NMI, but most commonly it is a hardware interrupt that is used to force a system to crash and take a core dump. This behaviour may be triggered by faulty hardware such as memory, a system reset button, or some software debugging utilities.
Uses of an NMI
  • Red Hat Enterprise Linux includes a watchdog timer that will trigger an NMI from software, should certain conditions be met. This is facility is known as the NMI watchdog.

  • The NMI watchdog facility enables the built-in kernel deadlock detector and is used to debug hard kernel lockups. By executing periodic NMI interrupts, the kernel can monitor whether any CPU has locked up and print out debugging messages as needed. When trying to debug hard lockups, enable nmi_watchdog by setting nmi_watchdog=1 during boot either by appending it to the kernel line in /etc/grub.conf or editing grub during boot up similar to the example below:

    # vi /boot/grub/grub.conf
    ......
    kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet  nmi_watchdog=1
    
  • Once enabled, check /proc/interrupts after the system boots. If NMI has a value larger than zero, the NMI watchdog has been configured properly. If not, try passing nmi_watchdog=2 to the kernel instead of nmi_watchdog=1. Again, check /proc/interrupts after the system boots. If NMI is still zero, the system does not support the NMI watchdog timer.

Further NMI Watchdog information
  • The difference between nmi_watchdog=1 and nmi_watchdog=2 is detailed in the kernel documentation (/usr/share/doc/kernel-doc-2.6.32/Documentation/nmi_watchdog.txt) below:

        <snip>
        In order to use the NMI watchdog, you need to have APIC support in your
        kernel. For SMP kernels, APIC support gets compiled in automatically. For
        UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local
        APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and
        features -> IO-APIC support on uniprocessors) in your kernel config.
        CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.
        CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
        kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
        may implicitly disable the NMI watchdog.]
    
        For x86-64, the needed APIC is always compiled in.
    
        Using local APIC (nmi_watchdog=2) needs the first performance register, so
        you can't use it for other purposes (such as high precision performance
        profiling.) However, at least oprofile and the perfctr driver disable the
        local APIC NMI watchdog automatically.
    
        To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
        parameter.  Eg. the relevant lilo.conf entry:
    
                append="nmi_watchdog=1"
    
        For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.
        For UP machines without an IO-APIC use nmi_watchdog=2, this only works
        for some processor types.  If in doubt, boot with nmi_watchdog=1 and
        check the NMI count in /proc/interrupts; if the count is zero then
        reboot with nmi_watchdog=2 and check the NMI count.  If it is still
        zero then log a problem, you probably have a processor that needs to be
        added to the nmi code.
        <snip>
    
Comments
  • Red Hat Enterprise Linux 4 EM64T ships with nmi_watchdog enabled by default. This is not the case with Red Hat Enterprise Linux 4 x86 as having nmi_watchdog enabled can lead to performance issues during states of intense cpu utilization.

  • To disable nmi_watchdog on EM64T, boot with the kernel parameter nmi_watchdog=0 on your grub line.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.