System panicked due to hung_task_panic set and huge backlog of NFS writes

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5.10
  • NFS

Issue

System panicked with this message:

INFO: task bptm:24197 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bptm          D ffff810c4e0c3a20     0 24197  23213                     (L-TLB)
 ffff8101c34b3d28 0000000000000046 0000000000000000 ffffffff887809f8
 0000000000000050 0000000000000009 ffff81095ac0a830 ffff81103fd4f040
 000001afa4e04f8d 00000000005b3f93 ffff81095ac0aa18 00000016c34b3d88
Call Trace:
 [<ffffffff887809f8>] :sunrpc:xprt_reserve+0x14d/0x161
 [<ffffffff8009b6bc>] recalc_sigpending+0xe/0x25
 [<ffffffff8006edac>] do_gettimeofday+0x40/0x90
 [<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff800637de>] io_schedule+0x3f/0x67
 [<ffffffff887df8da>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
 [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
 [<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a3fd0>] wake_bit_function+0x0/0x23
 [<ffffffff887e4648>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
 [<ffffffff887d9ee9>] :nfs:nfs_do_fsync+0x22/0x42
 [<ffffffff800241da>] filp_close+0x36/0x64
 [<ffffffff80039334>] put_files_struct+0x63/0xae
 [<ffffffff80015941>] do_exit+0x352/0x931
 [<ffffffff80049524>] cpuset_exit+0x0/0x88
 [<ffffffff8005d116>] system_call+0x7e/0x83

Kernel panic - not syncing: hung_task: blocked tasks

Resolution

Recommend reducing the vm.dirty_ratio and vm.dirty_background_ratio settings to 5 an 2 respectively and see if that reduces the occurrences of the hangs.

Root Cause

The vm.dirty_ratio is set to 40% and for a server with 64GB RAM that can result in a lot of dirty pages.

Diagnostic Steps

Looking at the latest vmcore:

      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.18-371.11.1.el5/vmlinux
    DUMPFILE: /cores/retrace/tasks/774320509/crash/vmcore  [PARTIAL DUMP]
        CPUS: 24
        DATE: Mon Sep  1 23:26:07 2014
      UPTIME: 00:30:04
LOAD AVERAGE: 20.61, 12.88, 6.07
       TASKS: 1102
    NODENAME: <nodename>
     RELEASE: 2.6.18-371.11.1.el5
     VERSION: #1 SMP Mon Jun 30 04:51:39 EDT 2014
     MACHINE: x86_64  (2663 Mhz)
      MEMORY: 63.1 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"

System panicked due to hung_task_panic being set and watchdog detected a stalled task:

INFO: task bptm:24197 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bptm          D ffff810c4e0c3a20     0 24197  23213                     (L-TLB)
 ffff8101c34b3d28 0000000000000046 0000000000000000 ffffffff887809f8
 0000000000000050 0000000000000009 ffff81095ac0a830 ffff81103fd4f040
 000001afa4e04f8d 00000000005b3f93 ffff81095ac0aa18 00000016c34b3d88
Call Trace:
 [<ffffffff887809f8>] :sunrpc:xprt_reserve+0x14d/0x161
 [<ffffffff8009b6bc>] recalc_sigpending+0xe/0x25
 [<ffffffff8006edac>] do_gettimeofday+0x40/0x90
 [<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff800637de>] io_schedule+0x3f/0x67
 [<ffffffff887df8da>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
 [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
 [<ffffffff887df8d1>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a3fd0>] wake_bit_function+0x0/0x23
 [<ffffffff887e4648>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
 [<ffffffff887d9ee9>] :nfs:nfs_do_fsync+0x22/0x42
 [<ffffffff800241da>] filp_close+0x36/0x64
 [<ffffffff80039334>] put_files_struct+0x63/0xae
 [<ffffffff80015941>] do_exit+0x352/0x931
 [<ffffffff80049524>] cpuset_exit+0x0/0x88
 [<ffffffff8005d116>] system_call+0x7e/0x83

Kernel panic - not syncing: hung_task: blocked tasks

crash> set 24197
    PID: 24197
COMMAND: "bptm"
   TASK: ffff81095ac0a830  [THREAD_INFO: ffff8101c34b2000]
    CPU: 22
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 24197  TASK: ffff81095ac0a830  CPU: 22  COMMAND: "bptm"
 #0 [ffff8101c34b3c58] schedule at ffffffff80062fa0
 #1 [ffff8101c34b3d30] io_schedule at ffffffff800637de
 #2 [ffff8101c34b3d50] nfs_wait_bit_uninterruptible at ffffffff887df8da [nfs]
 #3 [ffff8101c34b3d60] __wait_on_bit at ffffffff80063a0a
 #4 [ffff8101c34b3da0] out_of_line_wait_on_bit at ffffffff80063aa4
 #5 [ffff8101c34b3e10] nfs_sync_inode_wait at ffffffff887e4648 [nfs]
 #6 [ffff8101c34b3ea0] nfs_do_fsync at ffffffff887d9ee9 [nfs]
 #7 [ffff8101c34b3ec0] filp_close at ffffffff800241da
 #8 [ffff8101c34b3ee0] put_files_struct at ffffffff80039334
 #9 [ffff8101c34b3f10] do_exit at ffffffff80015941
#10 [ffff8101c34b3f80] system_call at ffffffff8005d116
    RIP: 00002b34b40d77cf  RSP: 00007fff188efd38  RFLAGS: 00010202
    RAX: 00000000000000e7  RBX: ffffffff8005d116  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 000000000000003c  RDI: 0000000000000000
    RBP: 00002b34b438c888   R8: 00000000000000e7   R9: ffffffffffffffa0
    R10: 00002b34b3493320  R11: 0000000000000246  R12: 0000000000000000
    R13: 00002b34b438c888  R14: ffffffff80049524  R15: 0000000000040000
    ORIG_RAX: 00000000000000e7  CS: 0033  SS: 002b

This task was fsync()'ing a file to the server and was waiting for a response from the server. There's no messages in the log about the server not responding so it's likely there was a lot of data queue up to be committed to the server.

crash> kmem -V
  VM_STAT:
      NR_ANON_PAGES: 272180
     NR_FILE_MAPPED: 58964
      NR_FILE_PAGES: 15419747
            NR_SLAB: 523708
       NR_PAGETABLE: 16728
      NR_FILE_DIRTY: 1653798
       NR_WRITEBACK: 717067
...

crash> pd 717067*4
$4 = 2868268

There is 2.8GB of data currently under writeback so it's no surprise there are some delays.

crash> pd 1653798*4
$5 = 6615192

And almost 6.5GB of dirty data that could be committed to the server at any time.

crash> p vm_dirty_ratio
vm_dirty_ratio = $1 = 40

The vm.dirty_ratio is set to 40% and for a server with 64GB RAM that can result in a lot of dirty pages. I recommend reducing the vm.dirty_ratio and vm.dirty_background_ratio settings to 5 an 2 respectively and see if that reduces the occurrences of the hangs.

Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.