High CPU usage after inserting leap second
Environment
- Red Hat Enterprise Linux 6 GA
- Red Hat Enterprise Linux 6.1
- Red Hat Enterprise Linux 6.2
- Red Hat Enterprise Linux 6.3
Issue
- After insertion of leap second several processes, notably
java, reporting high CPU usage - Logged messages shows back trace from
ktime_get - High CPU, non-responsive VMs
Resolution
Fixing the issue when it occurs:
Manually reset the date by executing any of the below commands:
date -s "`LC_ALL=C date`"
date `date +'%m%d%H%M%C%y.%S'`
Resetting the date will resolve any present issues.
Preventing the issue from occurring:
- RHEL 6.1 EUS:
Update tokernel-2.6.32-131.30.2(from RHBA-2012-1197) or later. - RHEL 6.2 EUS:
Update tokernel-2.6.32-220.25.1.el6(from RHBA-2012-1198) or later. - RHEL 6.3:
Update tokernel-2.6.32-279.5.2(from RHBA-2012-1199) or later. - RHEL 6.4 and later:
Already includes the fix for this issue.
Workaround:
If no updated kernel can be installed, ntpd can be used in slew mode which will slew the leap second slowly instead of stepping it. When slew mode is used, the kernel discipline is disabled, preventing this bug from occurring.
Root Cause
Previously in the kernel, when the leap second hrtimer was started, it was possible that the kernel livelocked on the xtime_lock variable. This issue was identified in This content is not included.Red Hat Bugzilla 840950.
After the leap second was inserted, applications that used futexes system calls consumed almost 100% of available CPU time. This occurred because the kernel's timekeeping structure update did not properly update these futexes. The futexes repeatedly expired, re-armed, and then expired immediately again.
Additional information is available from the original reporter, John Stultz, at the following link: Content from marc.info is not included.Leapsecond caused futex related load spikes
For more information about the leap second issue, see the following article - Resolve Leap Second Issues in Red Hat Enterprise Linux.
Diagnostic Steps
-
The following messages were frequently seen in the logs:
Jul 1 02:23:12 server kernel: [1121001.934357] INFO: task java:60028 blocked for more than 120 seconds. Jul 1 02:23:12 server kernel: [1121001.937203] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 1 02:23:12 server kernel: [1121001.940147] java D 0000000000000007 0 60028 1 0x00000080 Jul 1 02:23:12 server kernel: [1121001.943176] ffff880d775419c8 0000000000000082 0000000501bfe000 ffffc900180cb058 Jul 1 02:23:12 server kernel: [1121001.946293] ffffc900180cb020 ffffea002b821aa0 ffff880d34fe9580 000000d00002dc40 Jul 1 02:23:12 server kernel: [1121001.949488] ffff880d34fe9b38 ffff880d77541fd8 000000000000f4e8 ffff880d34fe9b38 Jul 1 02:23:12 server kernel: [1121001.952757] Call Trace: Jul 1 02:23:12 server kernel: [1121001.956042] [<ffffffff8109b949>] ? ktime_get_ts+0xa9/0xe0 Jul 1 02:23:21 server kernel: [1121001.959347] [<ffffffffa04bac10>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jul 1 02:23:21 server kernel: [1121001.962724] [<ffffffff814ed833>] io_schedule+0x73/0xc0 Jul 1 02:23:21 server kernel: [1121001.966138] [<ffffffffa04bac1e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs] Jul 1 02:23:21 server kernel: [1121001.969643] [<ffffffff814ee1ef>] __wait_on_bit+0x5f/0x90 Jul 1 02:23:21 server kernel: [1121001.973190] [<ffffffffa04bac10>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs] Jul 1 02:23:21 server kernel: [1121001.976738] [<ffffffff814ee298>] out_of_line_wait_on_bit+0x78/0x90 Jul 1 02:23:21 server kernel: [1121001.980253] [<ffffffff81090d70>] ? wake_bit_function+0x0/0x50 Jul 1 02:23:21 server kernel: [1121001.983813] [<ffffffff81110b6e>] ? find_get_page+0x1e/0xa0 Jul 1 02:23:21 server kernel: [1121001.987403] [<ffffffffa04babff>] nfs_wait_on_request+0x2f/0x40 [nfs] Jul 1 02:23:21 server kernel: [1121001.990991] [<ffffffffa04c145e>] nfs_updatepage+0x28e/0x590 [nfs] Jul 1 02:23:21 server kernel: [1121001.994526] [<ffffffffa04af5c2>] nfs_write_end+0x152/0x2b0 [nfs] Jul 1 02:23:21 server kernel: [1121001.998009] [<ffffffff811116c4>] generic_file_buffered_write+0x174/0x2a0 Jul 1 02:23:21 server kernel: [1121002.001488] [<ffffffff81112fb0>] __generic_file_aio_write+0x250/0x480 Jul 1 02:23:21 server kernel: [1121002.004944] [<ffffffff8111324f>] generic_file_aio_write+0x6f/0xe0 Jul 1 02:23:21 server kernel: [1121002.008370] [<ffffffffa04aeffe>] nfs_file_write+0xde/0x1f0 [nfs] Jul 1 02:23:21 server kernel: [1121002.011760] [<ffffffff8117661a>] do_sync_write+0xfa/0x140 Jul 1 02:23:21 server kernel: [1121002.015276] [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40 Jul 1 02:23:21 server kernel: [1121002.018657] [<ffffffff8120c646>] ? security_file_permission+0x16/0x20 Jul 1 02:23:21 server kernel: [1121002.022012] [<ffffffff81176918>] vfs_write+0xb8/0x1a0 Jul 1 02:23:21 server kernel: [1121002.025341] [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 Jul 1 02:23:21 server kernel: [1121002.028661] [<ffffffff81177321>] sys_write+0x51/0x90 Jul 1 02:23:21 server kernel: [1121002.031953] [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.