System panicked due to hung_task_panic set and huge backlog of NFS writes
Environment
- Red Hat Enterprise Linux 5.10
- NFS
Issue
System panicked with this message:
INFO: task bptm:24197 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bptm D ffff810c4e0c3a20 0 24197 23213 (L-TLB)
ffff8101c34b3d28 0000000000000046 0000000000000000 ffffffff887809f8
0000000000000050 0000000000000009 ffff81095ac0a830 ffff81103fd4f040
000001afa4e04f8d 00000000005b3f93 ffff81095ac0aa18 00000016c34b3d88
Call Trace:
[<ffffffff887809f8>] :sunrpc:xprt_reserve+0x14d/0x161
[<ffffffff8009b6bc>] recalc_sigpending+0xe/0x25
[<ffffffff8006edac>] do_gettimeofday+0x40/0x90
[<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff800637de>] io_schedule+0x3f/0x67
[<ffffffff887df8da>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
[<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
[<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff800a3fd0>] wake_bit_function+0x0/0x23
[<ffffffff887e4648>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
[<ffffffff887d9ee9>] :nfs:nfs_do_fsync+0x22/0x42
[<ffffffff800241da>] filp_close+0x36/0x64
[<ffffffff80039334>] put_files_struct+0x63/0xae
[<ffffffff80015941>] do_exit+0x352/0x931
[<ffffffff80049524>] cpuset_exit+0x0/0x88
[<ffffffff8005d116>] system_call+0x7e/0x83
Kernel panic - not syncing: hung_task: blocked tasks
Resolution
Recommend reducing the vm.dirty_ratio and vm.dirty_background_ratio settings to 5 an 2 respectively and see if that reduces the occurrences of the hangs.
Root Cause
The vm.dirty_ratio is set to 40% and for a server with 64GB RAM that can result in a lot of dirty pages.
Diagnostic Steps
Looking at the latest vmcore:
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.18-371.11.1.el5/vmlinux
DUMPFILE: /cores/retrace/tasks/774320509/crash/vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Mon Sep 1 23:26:07 2014
UPTIME: 00:30:04
LOAD AVERAGE: 20.61, 12.88, 6.07
TASKS: 1102
NODENAME: <nodename>
RELEASE: 2.6.18-371.11.1.el5
VERSION: #1 SMP Mon Jun 30 04:51:39 EDT 2014
MACHINE: x86_64 (2663 Mhz)
MEMORY: 63.1 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
System panicked due to hung_task_panic being set and watchdog detected a stalled task:
INFO: task bptm:24197 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bptm D ffff810c4e0c3a20 0 24197 23213 (L-TLB)
ffff8101c34b3d28 0000000000000046 0000000000000000 ffffffff887809f8
0000000000000050 0000000000000009 ffff81095ac0a830 ffff81103fd4f040
000001afa4e04f8d 00000000005b3f93 ffff81095ac0aa18 00000016c34b3d88
Call Trace:
[<ffffffff887809f8>] :sunrpc:xprt_reserve+0x14d/0x161
[<ffffffff8009b6bc>] recalc_sigpending+0xe/0x25
[<ffffffff8006edac>] do_gettimeofday+0x40/0x90
[<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff800637de>] io_schedule+0x3f/0x67
[<ffffffff887df8da>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
[<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
[<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff800a3fd0>] wake_bit_function+0x0/0x23
[<ffffffff887e4648>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
[<ffffffff887d9ee9>] :nfs:nfs_do_fsync+0x22/0x42
[<ffffffff800241da>] filp_close+0x36/0x64
[<ffffffff80039334>] put_files_struct+0x63/0xae
[<ffffffff80015941>] do_exit+0x352/0x931
[<ffffffff80049524>] cpuset_exit+0x0/0x88
[<ffffffff8005d116>] system_call+0x7e/0x83
Kernel panic - not syncing: hung_task: blocked tasks
crash> set 24197
PID: 24197
COMMAND: "bptm"
TASK: ffff81095ac0a830 [THREAD_INFO: ffff8101c34b2000]
CPU: 22
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 24197 TASK: ffff81095ac0a830 CPU: 22 COMMAND: "bptm"
#0 [ffff8101c34b3c58] schedule at ffffffff80062fa0
#1 [ffff8101c34b3d30] io_schedule at ffffffff800637de
#2 [ffff8101c34b3d50] nfs_wait_bit_uninterruptible at ffffffff887df8da [nfs]
#3 [ffff8101c34b3d60] __wait_on_bit at ffffffff80063a0a
#4 [ffff8101c34b3da0] out_of_line_wait_on_bit at ffffffff80063aa4
#5 [ffff8101c34b3e10] nfs_sync_inode_wait at ffffffff887e4648 [nfs]
#6 [ffff8101c34b3ea0] nfs_do_fsync at ffffffff887d9ee9 [nfs]
#7 [ffff8101c34b3ec0] filp_close at ffffffff800241da
#8 [ffff8101c34b3ee0] put_files_struct at ffffffff80039334
#9 [ffff8101c34b3f10] do_exit at ffffffff80015941
#10 [ffff8101c34b3f80] system_call at ffffffff8005d116
RIP: 00002b34b40d77cf RSP: 00007fff188efd38 RFLAGS: 00010202
RAX: 00000000000000e7 RBX: ffffffff8005d116 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00002b34b438c888 R8: 00000000000000e7 R9: ffffffffffffffa0
R10: 00002b34b3493320 R11: 0000000000000246 R12: 0000000000000000
R13: 00002b34b438c888 R14: ffffffff80049524 R15: 0000000000040000
ORIG_RAX: 00000000000000e7 CS: 0033 SS: 002b
This task was fsync()'ing a file to the server and was waiting for a response from the server. There's no messages in the log about the server not responding so it's likely there was a lot of data queue up to be committed to the server.
crash> kmem -V
VM_STAT:
NR_ANON_PAGES: 272180
NR_FILE_MAPPED: 58964
NR_FILE_PAGES: 15419747
NR_SLAB: 523708
NR_PAGETABLE: 16728
NR_FILE_DIRTY: 1653798
NR_WRITEBACK: 717067
...
crash> pd 717067*4
$4 = 2868268
There is 2.8GB of data currently under writeback so it's no surprise there are some delays.
crash> pd 1653798*4
$5 = 6615192
And almost 6.5GB of dirty data that could be committed to the server at any time.
crash> p vm_dirty_ratio
vm_dirty_ratio = $1 = 40
The vm.dirty_ratio is set to 40% and for a server with 64GB RAM that can result in a lot of dirty pages. I recommend reducing the vm.dirty_ratio and vm.dirty_background_ratio settings to 5 an 2 respectively and see if that reduces the occurrences of the hangs.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.