Logs contain "kernel: hrtimer: interrupt took ###### ns"

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9

Issue

The following message is seen in dmesg output, in the system journal, or in /var/log/messages:

kernel: hrtimer: interrupt took ###### ns

Resolution

  • For a virtual machine, check the hypervisor for overcommittment in CPUs and reduce overcommittment as needed.
  • For long hrtimer lists or for too much this, check /proc/timer_list. If one or more CPUs has several hundred or more hrtimers in the list or of .nr_hangs has a larger value, then the CPU is frequently taking too long to process the list of hrtimers it has or certain hrtimers are taking a very long time. If many of the function names in the list is recognizable as associated with a third party module or application, engage the vendor in question. Otherwise, engage the appropriate support vendor for your Red Hat Enterprise Linux platform.
    • Note, /proc/timer_list is updated every time it is read, so you may need to read the file several times to gather samples of the data. The data set therein is updated extremely frequently.
  • For physical systems without long hrtimer lists, consult your hardware vendor as needed.
  • In all instances, if the system is operating perfectly fine but omits this message from time to time, you can safely ignore it.

Root Cause

  • In general, the warning is printed when it takes a longer amount of time to process hrtimer interrupts and can be due to one or more of the following reasons:
    • The most common reason is a virtual machine residing on an overcommitted hypervisor. In times like this, a vCPU in the guest can begin processing its hrtimer list, be scheduled away from a pCPU, then the vCPU is schedule back onto a pCPU. When checking for more hrtimers, the CPU will notice a large leap forward in time presented from the hypervisor and print the message. This is the most common reason.
    • The list of hrtimers is extremely long thus implicating a lot of activity via hrtimers is occurring and "too many to execute", or the operations executed in one or more hrtimers takes an extremely long time implicating too much work is stuffed into a single hrtimer.
    • Hardware issues prevent the CPU from executing properly or updating times properly

Diagnostic Steps

Overview of HRTimers

  • Periodically with a high frequency, the Linux kernel will perform routine operations to keep the system running as needed via high resolution timer interrupts.
    • These operations include checking to see if processes need to be moved around on CPUs, processes sleeping for an extremely short time, updating system times on the CPUs as well as across the kernel, and any operations a kernel module may manually execute via high resolution timers.
    • A high resolution timer interrupt is sent from the Advanced Programmable Interrupt Controller (APIC) on the motherboard to each CPU periodically at a high frequency.
    • Each routine operation and when to run it is described in an hrtimer.
      • For example, an hrtimer might include a function to update the CPU timestamp 50ns from now.
  • Each CPU maintains its own list of these hrtimers and looks through its list when it receives a high resolution timer interrupt from the APIC.
  • If an hrtimer expires between receiving APIC interrupts, then the function in the hrtimer is executed.
    • For example, the current "time" on a CPU is 1000ns, and the CPU receives an APIC interrupt every 100ns (so an APIC is received on the CPU at timestamps 1000ns, 1100ns, 1200ns, 1300ns, etc). The CPU also has a list of hrtimers A, B, C, and D that expire at timestamps 1025ns, 1250ns, 1255ns, 1410ns respectively.
    • On the APIC interrupt at 1100ns, the CPU will execute hrtimer A because the expire time, 1025ns is between 1000ns and 1100ns,.
    • At 1200ns, no hrtimer is executed because no hrtimer expires between 1100ns and 1200ns.
    • At 1300ns, hrtimer B and C are executed because 1250ns and 1255ns is between 1200ns and 1300ns.
    • At 1400ns, no hrtimer is executed because no hrtimer expires between 1300ns and 1400ns.
    • At 1500ns, hrtimer D is executed because 1410ns is between 1400ns and 1500ns

Overview of HRTimer Implementation

Below is an annotated review of the execution of hrtimers in Red Hat Enterprise Linux 7. Annotations are made between the "SNIP" areas and precede the code it describes:

kernel/hrtimer.c:
1287 /*
1288  * High resolution timer interrupt
1289  * Called with interrupts disabled
1290  */
1291 void hrtimer_interrupt(struct clock_event_device *dev)
1292 {   

- - - - - - - -
Basic variable instantiation requirements for GCC compilation. 
We grab the hrtimer list for the cpu we are currently executing on.
- - - - - - - -

1293     struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
1294     ktime_t expires_next, now, entry_time, delta;
1295     int retries = 0;
1296     

- - - - - - - -
Panic the system with BUG_ON if the list is in an inconsistent state.
Increment the amount of timers hrtimers are processed on this CPU.
- - - - - - - -

1297     BUG_ON(!cpu_base->hres_active);
1298     cpu_base->nr_events++; 
1299     dev->next_event.tv64 = KTIME_MAX;
1300     

- - - - - - - -
Lock the hrtimer list so it is not updated while we process it
and grab the current time to indicate when the processing began
and mark the CPU as currently in a high resolution timer interrupt
- - - - - - - -

1301     raw_spin_lock(&cpu_base->lock);
1302     entry_time = now = hrtimer_update_base(cpu_base);
1303 retry:
1304     cpu_base->in_hrtirq = 1;
1305     /*
1306      * We set expires_next to KTIME_MAX here with cpu_base->lock
1307      * held to prevent that a timer is enqueued in our queue via
1308      * the migration code. This does not affect enqueueing of
1309      * timers which run their callback and need to be requeued on
1310      * this CPU.
1311      */
1312     cpu_base->expires_next.tv64 = KTIME_MAX;
1313 

- - - - - - - -
Parse the list of hrtimers and run any expired timers
- - - - - - - -

1314     __hrtimer_run_queues(cpu_base, now);
1315     

- - - - - - - -
Determine when the next expire time is, mark the
CPU as not handling hrtimers anymore, and unlock
the list of hrtimers
- - - - - - - -

1316     /* Reevaluate the clock bases for the next expiry */
1317     expires_next = __hrtimer_get_next_event(cpu_base);
1318     /*
1319      * Store the new expiry value so the migration code can verify
1320      * against it.
1321      */
1322     cpu_base->expires_next = expires_next;
1323     cpu_base->in_hrtirq = 0;
1324     raw_spin_unlock(&cpu_base->lock);
1325     

- - - - - - - -
Exit if there is nothing to do now
- - - - - - - -

1326     /* Reprogramming necessary ? */
1327     if (expires_next.tv64 == KTIME_MAX || 
1328         !tick_program_event(expires_next, 0)) {
1329         cpu_base->hang_detected = 0;
1330         return;
1331     }
1332     

- - - - - - - -
For some reason, the next hrtimer has already expired
while attempting to wrap up this process, so try no more
than 3 more times to execute hrtimers stopping either
after the 3rd attempt or when we finally finish running
all hrtimers that expired
- - - - - - - -

1333     /*
1334      * The next timer was already expired due to:
1335      * - tracing
1336      * - long lasting callbacks
1337      * - being scheduled away when running in a VM
1338      *
1339      * We need to prevent that we loop forever in the hrtimer
1340      * interrupt routine. We give it 3 attempts to avoid
1341      * overreacting on some spurious event.
1342      *
1343      * Acquire base lock for updating the offsets and retrieving
1344      * the current time.
1345      */
1346     raw_spin_lock(&cpu_base->lock);
1347     now = hrtimer_update_base(cpu_base);
1348     cpu_base->nr_retries++;
1349     if (++retries < 3)
1350         goto retry;

- - - - - - - -
After the 3rd attempt, if hrtimers are still expired and need to be ran,
defer them to the next hrtimer interrupt, increment the count of times
this CPU spent "too long" processing hrtimers, and print a warning 
indicating processing of hrtimers took X nanoseconds
- - - - - - - -

1351     /*
1352      * Give the system a chance to do something else than looping
1353      * here. We stored the entry time, so we know exactly how long
1354      * we spent here. We schedule the next event this amount of
1355      * time away.
1356      */
1357     cpu_base->nr_hangs++;
1358     cpu_base->hang_detected = 1;
1359     raw_spin_unlock(&cpu_base->lock);
1360     delta = ktime_sub(now, entry_time);
1361     if (delta.tv64 > cpu_base->max_hang_time.tv64)
1362         cpu_base->max_hang_time = delta;
1363     /*
1364      * Limit it to a sensible value as we enforce a longer
1365      * delay. Give the CPU at least 100ms to catch up.
1366      */
1367     if (delta.tv64 > 100 * NSEC_PER_MSEC)
1368         expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
1369     else
1370         expires_next = ktime_add(now, delta);
1371     tick_program_event(expires_next, 1);
1372     printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
1373             ktime_to_ns(delta));
1374 }
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.