Systems hang due to leap-second livelock.
Environment
- Red Hat Enterprise Linux 6.1 through 6.3
Issue
- Systems repeatedly crash due to NMI Watchdog detecting a hang. There are two similar backtraces that appear to be related to this issue.
- The first backtrace appears to be in
ktime_get. - The second backtrace appears to be in
getnstimeofday.
Resolution
-
The issue is resolved in RHEL 6.1.z EUS kernel version 2.6.32-131.30.2.el6 released as RHBA-2012-1197
-
The issue is resolved in RHEL 6.2.z EUS kernel version 2.6.32-220.25.1.el6 released as RHBA-2012-1198
-
The issue is resolved in RHEL 6.3 kernel version 2.6.32-279.5.2.el6 released as RHBA-2012-1199
-
The issue is resolved in RHEL 6.4 GA kernel version 2.6.32-358.el6 released as RHBA-2013-0496
-
Aside from upgrading to the fixed kernels, the current known workarounds are:
- Disable ntp & reboot so that the leap announcement is cleared from the kernel.
- Set forward the time past local midnight and let ntp adjust it over time (slow correct)
- Setting "tinker step" above 0.5 seconds within ntp.conf and restart ntp
- Use the -x option within /etc/sysconfig/ntpd and restart ntp service. This works only with ntp-4.2.6p5-3.el6_6 or later OR earlier versions than 4.2.6.
Root Cause
-
For more information about the leap second see Resolve Leap Second Issues in Red Hat Enterprise Linux.
-
The problem occurs when the system receives the announcement of the leapsecond, not when the leapsecond insertion actually happens. The problem is timing-dependent, so it is possible for a system to successfully queue the insertion. However, the request for leapsecond insertion has to be cancelled and re-queued every time ntp adjusts the system time, so there is a chance of the problem happening continuously if ntpd is running.
-
A patch upstream Content from lkml.org is not included.ntp: Fix leap-second hrtimer livelock describes a similar livelock in the newer kernel. The locking has changed but the fact that to code paths take the locks in reverse order from each other remains. The proposed solution there is to not use
hrtimerto insert leapseconds. -
This can be caused from two current call paths:
-
The code path in the first one, in
do_adjtimex()takeswrite_seqlock_irq(&xtime_lock)and then tries to get a timer base lock in__hrtimer_start_range_ns().PID: 576 TASK: ffff880c455c8080 CPU: 1 COMMAND: "ntpd" #0 [ffff880685407af0] machine_kexec at ffffffff810310cb #1 [ffff880685407b50] crash_kexec at ffffffff810b6312 #2 [ffff880685407c20] panic at ffffffff814da175 #3 [ffff880685407ca0] watchdog_overflow_callback at ffffffff810d63fd #4 [ffff880685407cc0] __perf_event_overflow at ffffffff81108a96 #5 [ffff880685407d60] perf_event_overflow at ffffffff81109089 #6 [ffff880685407d70] intel_pmu_handle_irq at ffffffff8101dd46 #7 [ffff880685407e80] perf_event_nmi_handler at ffffffff814de6f8 #8 [ffff880685407ea0] notifier_call_chain at ffffffff814e0235 #9 [ffff880685407ee0] atomic_notifier_call_chain at ffffffff814e029a #10 [ffff880685407ef0] notify_die at ffffffff8109409e #11 [ffff880685407f20] do_nmi at ffffffff814ddea3 #12 [ffff880685407f50] nmi at ffffffff814dd7b0 [exception RIP: _spin_lock_irqsave+41] RIP: ffffffff814dcf09 RSP: ffff880b5ca05d58 RFLAGS: 00000097 RAX: 0000000000000138 RBX: ffff880685490f48 RCX: 0000000000000137 RDX: 0000000000000086 RSI: ffff880b5ca05df0 RDI: ffff880685490f40 RBP: ffff880b5ca05d58 R8: 0000000000000001 R9: 00000000000007dc R10: 0000000000000000 R11: 0000000000000206 R12: ffffffff81dde860 R13: ffff880b5ca05df0 R14: 0000000000010f40 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff880b5ca05d58] _spin_lock_irqsave at ffffffff814dcf09 #14 [ffff880b5ca05d60] lock_hrtimer_base at ffffffff81092491 #15 [ffff880b5ca05d90] __hrtimer_start_range_ns at ffffffff81092e2a #16 [ffff880b5ca05e30] hrtimer_start at ffffffff81093268 #17 [ffff880b5ca05e40] ntp_start_leap_timer at ffffffff8109959d #18 [ffff880b5ca05e50] do_adjtimex at ffffffff81099832 #19 [ffff880b5ca05e80] sys_adjtimex at ffffffff8106dd78 #20 [ffff880b5ca05f80] system_call_fastpath at ffffffff8100b172 RIP: 00007f9d1e518d37 RSP: 00007fffcd1e78a0 RFLAGS: 00000246 RAX: 000000000000009f RBX: ffffffff8100b172 RCX: 0000000000000000 RDX: 00007f9d1f6f1ba8 RSI: 0000000000000005 RDI: 00007f9d1f6f1c20 RBP: 00007f9d1f6f1bb0 R8: 0000000000000000 R9: 00000000000007dc R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9d1f6f1c20 R13: 00007f9d1f6a5238 R14: 0000000000000260 R15: 00007f9d1f6a5238 ORIG_RAX: 000000000000009f CS: 0033 SS: 002b -
The second one takes this same timer base lock in
__hrtimer_start_range_ns()and then triesseq = read_seqbegin(&xtime_lock)inktime_get().PID: 0 TASK: ffff880654bfb580 CPU: 9 COMMAND: "swapper" #0 [ffff880685487e90] crash_nmi_callback at ffffffff81028a66 #1 [ffff880685487ea0] notifier_call_chain at ffffffff814e0235 #2 [ffff880685487ee0] atomic_notifier_call_chain at ffffffff814e029a #3 [ffff880685487ef0] notify_die at ffffffff8109409e #4 [ffff880685487f20] do_nmi at ffffffff814ddea3 #5 [ffff880685487f50] nmi at ffffffff814dd7b0 [exception RIP: ktime_get+180] RIP: ffffffff81098d84 RSP: ffff880c54f73d28 RFLAGS: 00000002 RAX: 00000e2f1135ec33 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000e2f1135ec33 RDI: ffff88068548db40 RBP: ffff880c54f73d48 R8: 00000e2f1135ec33 R9: ffff880685490f48 R10: 00000e2fbb3c41e3 R11: 0000000100e962be R12: ffff88068548db40 R13: 0000000001e0bc91 R14: ffffffff817ac524 R15: 0000000000000002 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff880c54f73d28] ktime_get at ffffffff81098d84 #7 [ffff880c54f73d50] tick_dev_program_event at ffffffff8109da69 #8 [ffff880c54f73d90] tick_program_event at ffffffff8109db7a #9 [ffff880c54f73da0] hrtimer_force_reprogram at ffffffff81091fdf #10 [ffff880c54f73db0] __remove_hrtimer at ffffffff8109208f #11 [ffff880c54f73df0] __hrtimer_start_range_ns at ffffffff81093063 #12 [ffff880c54f73e90] hrtimer_start at ffffffff81093268 #13 [ffff880c54f73ea0] tick_nohz_stop_sched_tick at ffffffff8109e965 #14 [ffff880c54f73f00] cpu_idle at ffffffff81009e49
Diagnostic Steps
- Obtain vmcores from NMI Watchdog crashing the box due to lockups and saw the call traces:
-
Backtrace in
ktime_get:PID: 15173 TASK: ffff88065c968ac0 CPU: 5 COMMAND: "java" #0 [ffff880685447b00] machine_kexec at ffffffff810321cb #1 [ffff880685447b60] crash_kexec at ffffffff810b8ef2 #2 [ffff880685447c30] panic at ffffffff814ec3d1 #3 [ffff880685447cb0] watchdog_overflow_callback at ffffffff810d8f2d #4 [ffff880685447cd0] __perf_event_overflow at ffffffff8110a81d #5 [ffff880685447d70] perf_event_overflow at ffffffff8110add4 #6 [ffff880685447d80] intel_pmu_handle_irq at ffffffff8101e296 #7 [ffff880685447e90] perf_event_nmi_handler at ffffffff814f0a89 #8 [ffff880685447ea0] notifier_call_chain at ffffffff814f25d5 #9 [ffff880685447ee0] atomic_notifier_call_chain at ffffffff814f263a #10 [ffff880685447ef0] notify_die at ffffffff81096a3e #11 [ffff880685447f20] do_nmi at ffffffff814f0253 #12 [ffff880685447f50] nmi at ffffffff814efb60 [exception RIP: ktime_get+180] RIP: ffffffff8109b764 RSP: ffff880685443ee8 RFLAGS: 00000002 RAX: 0000000000010f00 RBX: ffff880685450f08 RCX: 0000000000000000 RDX: ffff880685440000 RSI: 0000000000000005 RDI: ffff88068544db00 RBP: ffff880685443f08 R8: 7fffffffffffffff R9: 0000000000000001 R10: 0000000000000018 R11: 00007f6790cc6620 R12: ffff880685443f48 R13: 000000005fe553fd R14: 0000000000000000 R15: 7fffffffffffffff ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff880685443ee8] ktime_get at ffffffff8109b764 #14 [ffff880685443f10] hrtimer_interrupt at ffffffff81095509 #15 [ffff880685443f90] smp_apic_timer_interrupt at ffffffff814f4e8b #16 [ffff880685443fb0] apic_timer_interrupt at ffffffff8100bc13 --- <IRQ stack> --- #17 [ffff88065c3d3e28] apic_timer_interrupt at ffffffff8100bc13 [exception RIP: current_kernel_time+59] RIP: ffffffff8109b7cb RSP: ffff88065c3d3ed8 RFLAGS: 00000202 RAX: 0000000000000001 RBX: ffff88065c3d3ef8 RCX: 000000005fe553fd RDX: 00007f6e5d3f4500 RSI: 0000000000000000 RDI: 00000000c000003e RBP: ffffffff8100bc0e R8: 0000000000000000 R9: 00007f6e548f7890 R10: 0000000000000018 R11: 00007f6790cc6620 R12: 0000000000000000 R13: ffff88065c968af8 R14: ffff88065c968ac0 R15: 0000000000000000 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #18 [ffff88065c3d3f00] audit_syscall_entry at ffffffff810d4662 #19 [ffff88065c3d3f80] auditsys at ffffffff8100b1dd RIP: 00000031df0cdc87 RSP: 00007f6e548f7958 RFLAGS: 00000207 RAX: 0000000000000018 RBX: ffffffff8100b1dd RCX: 0000000000000f8c RDX: 0000000000000000 RSI: 00007f6e5d3f4500 RDI: 00007f6790cc6620 RBP: 00007f6e548f7940 R8: 0000000000002000 R9: 0000000000003b45 R10: 00007f6e548f7890 R11: 0000000000000202 R12: 0000000000000000 R13: 00000000000001d1 R14: 0000000000000004 R15: 00007f6790cc6620 ORIG_RAX: 0000000000000018 CS: 0033 SS: 002b -
The second backtrace was in
getnstimeofday.PID: 0 TASK: ffff880660799540 CPU: 11 COMMAND: "swapper" #0 [ffff8806854a7b00] machine_kexec at ffffffff810321cb #1 [ffff8806854a7b60] crash_kexec at ffffffff810b8ef2 #2 [ffff8806854a7c30] panic at ffffffff814ec3d1 #3 [ffff8806854a7cb0] watchdog_overflow_callback at ffffffff810d8f2d #4 [ffff8806854a7cd0] __perf_event_overflow at ffffffff8110a81d #5 [ffff8806854a7d70] perf_event_overflow at ffffffff8110add4 #6 [ffff8806854a7d80] intel_pmu_handle_irq at ffffffff8101e296 #7 [ffff8806854a7e90] perf_event_nmi_handler at ffffffff814f0a89 #8 [ffff8806854a7ea0] notifier_call_chain at ffffffff814f25d5 #9 [ffff8806854a7ee0] atomic_notifier_call_chain at ffffffff814f263a #10 [ffff8806854a7ef0] notify_die at ffffffff81096a3e #11 [ffff8806854a7f20] do_nmi at ffffffff814f0253 #12 [ffff8806854a7f50] nmi at ffffffff814efb60 [exception RIP: getnstimeofday+202] RIP: ffffffff8109b34a RSP: ffff880c60455e18 RFLAGS: 00000002 RAX: 0000000000000000 RBX: ffff880c60455e48 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff880c60455fd8 RDI: ffff880c60455e48 RBP: ffff880c60455e38 R8: 0000000000000003 R9: 0000000000000320 R10: 001ebf406e42bee9 R11: 0000000000000000 R12: 00000000a0a2068b R13: 129c6a9e932bdae1 R14: 0000000000000002 R15: 000000000000000b ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #13 [ffff880c60455e18] getnstimeofday at ffffffff8109b34a #14 [ffff880c60455e40] ktime_get_real at ffffffff8109b386 #15 [ffff880c60455e60] intel_idle at ffffffff812c4ab6 #16 [ffff880c60455ee0] cpuidle_idle_call at ffffffff813f9ff7 #17 [ffff880c60455f00] cpu_idle at ffffffff81009e06
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.