Systems hang due to leap-second livelock.

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 6.1 through 6.3

Issue

  • Systems repeatedly crash due to NMI Watchdog detecting a hang. There are two similar backtraces that appear to be related to this issue.
  1. The first backtrace appears to be in ktime_get.
  2. The second backtrace appears to be in getnstimeofday.

Resolution

  • The issue is resolved in RHEL 6.1.z EUS kernel version 2.6.32-131.30.2.el6 released as RHBA-2012-1197

  • The issue is resolved in RHEL 6.2.z EUS kernel version 2.6.32-220.25.1.el6 released as RHBA-2012-1198

  • The issue is resolved in RHEL 6.3 kernel version 2.6.32-279.5.2.el6 released as RHBA-2012-1199

  • The issue is resolved in RHEL 6.4 GA kernel version 2.6.32-358.el6 released as RHBA-2013-0496

  • Aside from upgrading to the fixed kernels, the current known workarounds are:

    1. Disable ntp & reboot so that the leap announcement is cleared from the kernel.
    2. Set forward the time past local midnight and let ntp adjust it over time (slow correct)
    3. Setting "tinker step" above 0.5 seconds within ntp.conf and restart ntp
    4. Use the -x option within /etc/sysconfig/ntpd and restart ntp service. This works only with ntp-4.2.6p5-3.el6_6 or later OR earlier versions than 4.2.6.

Root Cause

  • For more information about the leap second see Resolve Leap Second Issues in Red Hat Enterprise Linux.

  • The problem occurs when the system receives the announcement of the leapsecond, not when the leapsecond insertion actually happens. The problem is timing-dependent, so it is possible for a system to successfully queue the insertion. However, the request for leapsecond insertion has to be cancelled and re-queued every time ntp adjusts the system time, so there is a chance of the problem happening continuously if ntpd is running.

  • A patch upstream Content from lkml.org is not included.ntp: Fix leap-second hrtimer livelock describes a similar livelock in the newer kernel. The locking has changed but the fact that to code paths take the locks in reverse order from each other remains. The proposed solution there is to not use hrtimer to insert leapseconds.

  • This can be caused from two current call paths:

  1. The code path in the first one, in do_adjtimex() takes write_seqlock_irq(&xtime_lock) and then tries to get a timer base lock in __hrtimer_start_range_ns().

    PID: 576    TASK: ffff880c455c8080  CPU: 1   COMMAND: "ntpd"
     #0 [ffff880685407af0] machine_kexec at ffffffff810310cb
     #1 [ffff880685407b50] crash_kexec at ffffffff810b6312
     #2 [ffff880685407c20] panic at ffffffff814da175
     #3 [ffff880685407ca0] watchdog_overflow_callback at ffffffff810d63fd
     #4 [ffff880685407cc0] __perf_event_overflow at ffffffff81108a96
     #5 [ffff880685407d60] perf_event_overflow at ffffffff81109089
     #6 [ffff880685407d70] intel_pmu_handle_irq at ffffffff8101dd46
     #7 [ffff880685407e80] perf_event_nmi_handler at ffffffff814de6f8
     #8 [ffff880685407ea0] notifier_call_chain at ffffffff814e0235
     #9 [ffff880685407ee0] atomic_notifier_call_chain at ffffffff814e029a
    #10 [ffff880685407ef0] notify_die at ffffffff8109409e
    #11 [ffff880685407f20] do_nmi at ffffffff814ddea3
    #12 [ffff880685407f50] nmi at ffffffff814dd7b0
        [exception RIP: _spin_lock_irqsave+41]
        RIP: ffffffff814dcf09  RSP: ffff880b5ca05d58  RFLAGS: 00000097
        RAX: 0000000000000138  RBX: ffff880685490f48  RCX: 0000000000000137
        RDX: 0000000000000086  RSI: ffff880b5ca05df0  RDI: ffff880685490f40
        RBP: ffff880b5ca05d58   R8: 0000000000000001   R9: 00000000000007dc
        R10: 0000000000000000  R11: 0000000000000206  R12: ffffffff81dde860
        R13: ffff880b5ca05df0  R14: 0000000000010f40  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
    #13 [ffff880b5ca05d58] _spin_lock_irqsave at ffffffff814dcf09
    #14 [ffff880b5ca05d60] lock_hrtimer_base at ffffffff81092491
    #15 [ffff880b5ca05d90] __hrtimer_start_range_ns at ffffffff81092e2a
    #16 [ffff880b5ca05e30] hrtimer_start at ffffffff81093268
    #17 [ffff880b5ca05e40] ntp_start_leap_timer at ffffffff8109959d
    #18 [ffff880b5ca05e50] do_adjtimex at ffffffff81099832
    #19 [ffff880b5ca05e80] sys_adjtimex at ffffffff8106dd78
    #20 [ffff880b5ca05f80] system_call_fastpath at ffffffff8100b172
        RIP: 00007f9d1e518d37  RSP: 00007fffcd1e78a0  RFLAGS: 00000246
        RAX: 000000000000009f  RBX: ffffffff8100b172  RCX: 0000000000000000
        RDX: 00007f9d1f6f1ba8  RSI: 0000000000000005  RDI: 00007f9d1f6f1c20
        RBP: 00007f9d1f6f1bb0   R8: 0000000000000000   R9: 00000000000007dc
        R10: 0000000000000000  R11: 0000000000000206  R12: 00007f9d1f6f1c20
        R13: 00007f9d1f6a5238  R14: 0000000000000260  R15: 00007f9d1f6a5238
        ORIG_RAX: 000000000000009f  CS: 0033  SS: 002b
    
  2. The second one takes this same timer base lock in __hrtimer_start_range_ns() and then tries seq = read_seqbegin(&xtime_lock) in ktime_get().

    PID: 0      TASK: ffff880654bfb580  CPU: 9   COMMAND: "swapper"
     #0 [ffff880685487e90] crash_nmi_callback at ffffffff81028a66
     #1 [ffff880685487ea0] notifier_call_chain at ffffffff814e0235
     #2 [ffff880685487ee0] atomic_notifier_call_chain at ffffffff814e029a
     #3 [ffff880685487ef0] notify_die at ffffffff8109409e
     #4 [ffff880685487f20] do_nmi at ffffffff814ddea3
     #5 [ffff880685487f50] nmi at ffffffff814dd7b0
        [exception RIP: ktime_get+180]
        RIP: ffffffff81098d84  RSP: ffff880c54f73d28  RFLAGS: 00000002
        RAX: 00000e2f1135ec33  RBX: 0000000000000001  RCX: 0000000000000000
        RDX: 0000000000000001  RSI: 00000e2f1135ec33  RDI: ffff88068548db40
        RBP: ffff880c54f73d48   R8: 00000e2f1135ec33   R9: ffff880685490f48
        R10: 00000e2fbb3c41e3  R11: 0000000100e962be  R12: ffff88068548db40
        R13: 0000000001e0bc91  R14: ffffffff817ac524  R15: 0000000000000002
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
     #6 [ffff880c54f73d28] ktime_get at ffffffff81098d84
     #7 [ffff880c54f73d50] tick_dev_program_event at ffffffff8109da69
     #8 [ffff880c54f73d90] tick_program_event at ffffffff8109db7a
     #9 [ffff880c54f73da0] hrtimer_force_reprogram at ffffffff81091fdf
    #10 [ffff880c54f73db0] __remove_hrtimer at ffffffff8109208f
    #11 [ffff880c54f73df0] __hrtimer_start_range_ns at ffffffff81093063
    #12 [ffff880c54f73e90] hrtimer_start at ffffffff81093268
    #13 [ffff880c54f73ea0] tick_nohz_stop_sched_tick at ffffffff8109e965
    #14 [ffff880c54f73f00] cpu_idle at ffffffff81009e49
    

Diagnostic Steps

  • Obtain vmcores from NMI Watchdog crashing the box due to lockups and saw the call traces:
  1. Backtrace in ktime_get:

    PID: 15173  TASK: ffff88065c968ac0  CPU: 5   COMMAND: "java"
     #0 [ffff880685447b00] machine_kexec at ffffffff810321cb
     #1 [ffff880685447b60] crash_kexec at ffffffff810b8ef2
     #2 [ffff880685447c30] panic at ffffffff814ec3d1
     #3 [ffff880685447cb0] watchdog_overflow_callback at ffffffff810d8f2d
     #4 [ffff880685447cd0] __perf_event_overflow at ffffffff8110a81d
     #5 [ffff880685447d70] perf_event_overflow at ffffffff8110add4
     #6 [ffff880685447d80] intel_pmu_handle_irq at ffffffff8101e296
     #7 [ffff880685447e90] perf_event_nmi_handler at ffffffff814f0a89
     #8 [ffff880685447ea0] notifier_call_chain at ffffffff814f25d5
     #9 [ffff880685447ee0] atomic_notifier_call_chain at ffffffff814f263a
    #10 [ffff880685447ef0] notify_die at ffffffff81096a3e
    #11 [ffff880685447f20] do_nmi at ffffffff814f0253
    #12 [ffff880685447f50] nmi at ffffffff814efb60
        [exception RIP: ktime_get+180]
        RIP: ffffffff8109b764  RSP: ffff880685443ee8  RFLAGS: 00000002
        RAX: 0000000000010f00  RBX: ffff880685450f08  RCX: 0000000000000000
        RDX: ffff880685440000  RSI: 0000000000000005  RDI: ffff88068544db00
        RBP: ffff880685443f08   R8: 7fffffffffffffff   R9: 0000000000000001
        R10: 0000000000000018  R11: 00007f6790cc6620  R12: ffff880685443f48
        R13: 000000005fe553fd  R14: 0000000000000000  R15: 7fffffffffffffff
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
    #13 [ffff880685443ee8] ktime_get at ffffffff8109b764
    #14 [ffff880685443f10] hrtimer_interrupt at ffffffff81095509
    #15 [ffff880685443f90] smp_apic_timer_interrupt at ffffffff814f4e8b
    #16 [ffff880685443fb0] apic_timer_interrupt at ffffffff8100bc13
    --- <IRQ stack> ---
    #17 [ffff88065c3d3e28] apic_timer_interrupt at ffffffff8100bc13
        [exception RIP: current_kernel_time+59]
        RIP: ffffffff8109b7cb  RSP: ffff88065c3d3ed8  RFLAGS: 00000202
        RAX: 0000000000000001  RBX: ffff88065c3d3ef8  RCX: 000000005fe553fd
        RDX: 00007f6e5d3f4500  RSI: 0000000000000000  RDI: 00000000c000003e
        RBP: ffffffff8100bc0e   R8: 0000000000000000   R9: 00007f6e548f7890
        R10: 0000000000000018  R11: 00007f6790cc6620  R12: 0000000000000000
        R13: ffff88065c968af8  R14: ffff88065c968ac0  R15: 0000000000000000
        ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
    #18 [ffff88065c3d3f00] audit_syscall_entry at ffffffff810d4662
    #19 [ffff88065c3d3f80] auditsys at ffffffff8100b1dd
        RIP: 00000031df0cdc87  RSP: 00007f6e548f7958  RFLAGS: 00000207
        RAX: 0000000000000018  RBX: ffffffff8100b1dd  RCX: 0000000000000f8c
        RDX: 0000000000000000  RSI: 00007f6e5d3f4500  RDI: 00007f6790cc6620
        RBP: 00007f6e548f7940   R8: 0000000000002000   R9: 0000000000003b45
        R10: 00007f6e548f7890  R11: 0000000000000202  R12: 0000000000000000
        R13: 00000000000001d1  R14: 0000000000000004  R15: 00007f6790cc6620
        ORIG_RAX: 0000000000000018  CS: 0033  SS: 002b
    
  2. The second backtrace was in getnstimeofday.

    PID: 0      TASK: ffff880660799540  CPU: 11  COMMAND: "swapper"
     #0 [ffff8806854a7b00] machine_kexec at ffffffff810321cb
     #1 [ffff8806854a7b60] crash_kexec at ffffffff810b8ef2
     #2 [ffff8806854a7c30] panic at ffffffff814ec3d1
     #3 [ffff8806854a7cb0] watchdog_overflow_callback at ffffffff810d8f2d
     #4 [ffff8806854a7cd0] __perf_event_overflow at ffffffff8110a81d
     #5 [ffff8806854a7d70] perf_event_overflow at ffffffff8110add4
     #6 [ffff8806854a7d80] intel_pmu_handle_irq at ffffffff8101e296
     #7 [ffff8806854a7e90] perf_event_nmi_handler at ffffffff814f0a89
     #8 [ffff8806854a7ea0] notifier_call_chain at ffffffff814f25d5
     #9 [ffff8806854a7ee0] atomic_notifier_call_chain at ffffffff814f263a
    #10 [ffff8806854a7ef0] notify_die at ffffffff81096a3e
    #11 [ffff8806854a7f20] do_nmi at ffffffff814f0253
    #12 [ffff8806854a7f50] nmi at ffffffff814efb60
        [exception RIP: getnstimeofday+202]
        RIP: ffffffff8109b34a  RSP: ffff880c60455e18  RFLAGS: 00000002
        RAX: 0000000000000000  RBX: ffff880c60455e48  RCX: 0000000000000001
        RDX: 0000000000000000  RSI: ffff880c60455fd8  RDI: ffff880c60455e48
        RBP: ffff880c60455e38   R8: 0000000000000003   R9: 0000000000000320
        R10: 001ebf406e42bee9  R11: 0000000000000000  R12: 00000000a0a2068b
        R13: 129c6a9e932bdae1  R14: 0000000000000002  R15: 000000000000000b
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
    #13 [ffff880c60455e18] getnstimeofday at ffffffff8109b34a
    #14 [ffff880c60455e40] ktime_get_real at ffffffff8109b386
    #15 [ffff880c60455e60] intel_idle at ffffffff812c4ab6
    #16 [ffff880c60455ee0] cpuidle_idle_call at ffffffff813f9ff7
    #17 [ffff880c60455f00] cpu_idle at ffffffff81009e06
    
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.