'cp' over an NFS share hangs and triggers hung_task_timeout on Red Hat Enterprise Linux 5.8

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5.8
    • kernel 2.6.18-308.24.1.el5
    • NFSv3 client
  • Veritas cluster

Issue

  • Periodically, cp becomes blocked and triggers hung_task_timeout.
  • The backtrace seen in messages file looks like this
INFO: task cp:27324 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cp            D ffffffff80157271     0 27324  20865                     (NOTLB)
 ffff811457239da8 0000000000000082 0000000000000001 ffffffff8033f0e8
 0000000000000000 0000000000000007 ffff81154e4f2860 ffff81407f8bf080
 00005dd4b605f5d5 000000000000049f ffff81154e4f2a48 000000491dc5baf8
Call Trace:
 [<ffffffff8006ece7>] do_gettimeofday+0x40/0x90
 [<ffffffff88d5f4c9>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff800637de>] io_schedule+0x3f/0x67
 [<ffffffff88d5f4d2>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
 [<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
 [<ffffffff88d5f4c9>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
 [<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff800a34f9>] wake_bit_function+0x0/0x23
 [<ffffffff88d64218>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
 [<ffffffff88d59ecd>] :nfs:nfs_do_fsync+0x22/0x42
 [<ffffffff80023d0e>] filp_close+0x36/0x64
 [<ffffffff8001e13b>] sys_close+0x88/0xbd

Resolution

Collect a vmcore as mentioned in the "Diagnostic Steps" section and engage Red Hat support to analyse it.

Diagnostic Steps

  • Take a vmcore at the time of the hang, either triggered with hung_task_panic = 1 or sysrq-c
  • Analyze the vmcore
This vmcore is similar to before, but not identical.  It is a hung task panic, with 'cp' causing the hung task timer to fire.
The files involved all look like:
/aideliverables/prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.*

These are on the following NFS mount:
 nfs    hds2nas3:/mnt/aideliverables /aideliverables


Here's the interesting processes so far and the files being operated on:
27324: cp process triggering the hung task panic; doing 'close' system call but stuck inside nfs_sync_inode_wait
crash> file 0xffff81806bb23580 | grep dentry
  f_dentry = 0xffff816b5a9deb70, 
crash> files -d 0xffff816b5a9deb70
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff816b5a9deb70 ffff8168a585d298 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.504_504.dat.gz

17326: cp process blocked almost as long as 27324, same backtrace as 27324; operating on a file in the same path but with a different name
crash> file 0xffff816f5efbfdc0 | grep dentry
  f_dentry = 0xffff8138b981d3d8, 
crash> files -d 0xffff8138b981d3d8
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8138b981d3d8 ffff817d1561a6e0 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.509_509.dat.gz

22142: running 'cp' process; copying data from userspace to kernel for a write to a file of similar path, but different name from the above two.
crash> file 0xffff817dced353c0 | grep dentry
  f_dentry = 0xffff817d380e1660, 
crash> files -d 0xffff817d380e1660
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff817d380e1660 ffff8165ceb8d7e0 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.518_518.dat.gz

All of the above are on the same superblock, so here is the mount:
crash> mount | grep ffff8180786b0c00
ffff81807fe50ac0 ffff8180786b0c00 nfs    hds2nas3:/mnt/aideliverables /aideliverables


Details
=======

crash> sys
      KERNEL: /cores/20130603141348/work/vmlinux
    DUMPFILE: /cores/20130603141348/work/04457217.prod8.vmcore  [PARTIAL DUMP]
        CPUS: 80
        DATE: Sun Jun  2 12:53:32 2013
      UPTIME: 1 days, 04:42:13
LOAD AVERAGE: 34.53, 33.04, 27.21
       TASKS: 2734
    NODENAME: www.example.com
     RELEASE: 2.6.18-308.24.1.el5
     VERSION: #1 SMP Wed Nov 21 11:42:14 EST 2012
     MACHINE: x86_64  (2394 Mhz)
      MEMORY: 505 GB
       PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
[dwysocha@megatron work]$ grep UN ps_dtime.txt 
[  0 00:00:00.000] UN+[UN]  PID: 5596   TASK: ffff81407f0ee0c0  CPU: 41  COMMAND: "pdflush"
[  0 00:00:00.000] UN+[UN]  PID: 14072  TASK: ffff8180788d37a0  CPU: 42  COMMAND: "syslogd"
[  0 00:00:00.015] UN+[UN]  PID: 29211  TASK: ffff8149d55dd100  CPU: 13  COMMAND: "updatedb"
[  0 00:00:02.650] UN+[UN]  PID: 17954  TASK: ffff816cc10e1820  CPU: 6   COMMAND: "cp"
[  0 00:00:03.115] UN+[UN]  PID: 2942   TASK: ffff8172376c27e0  CPU: 27  COMMAND: "cp"
[  0 00:00:06.481] UN+[UN]  PID: 30081  TASK: ffff811823cfc100  CPU: 16  COMMAND: "cp"
[  0 00:00:09.627] UN+[UN]  PID: 21108  TASK: ffff815ed44bb7e0  CPU: 7   COMMAND: "cp"
[  0 00:00:13.934] UN+[UN]  PID: 17304  TASK: ffff812e11b2d860  CPU: 2   COMMAND: "cp"
[  0 00:01:39.992] UN+[UN]  PID: 22878  TASK: ffff812e8bcb3080  CPU: 72  COMMAND: "cp"
[  0 00:02:17.066] UN+[UN]  PID: 17326  TASK: ffff813e50f0e7a0  CPU: 43  COMMAND: "cp"
[  0 00:02:38.311] UN+[UN]  PID: 27324  TASK: ffff81154e4f2860  CPU: 73  COMMAND: "cp"  <------------ caused the hung task timer to fire

crash> p sysctl_hung_task_timeout_secs 
sysctl_hung_task_timeout_secs = $2 = 120
crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $3 = 1

crash> bt 27324
PID: 27324  TASK: ffff81154e4f2860  CPU: 73  COMMAND: "cp"
 #0 [ffff811457239cd8] schedule at ffffffff80062fa0
 #1 [ffff811457239db0] io_schedule at ffffffff800637de
 #2 [ffff811457239dd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
 #3 [ffff811457239de0] __wait_on_bit at ffffffff80063a0a
 #4 [ffff811457239e20] out_of_line_wait_on_bit at ffffffff80063aa4
 #5 [ffff811457239e90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
 #6 [ffff811457239f20] nfs_do_fsync at ffffffff88d59ecd [nfs]
 #7 [ffff811457239f40] filp_close at ffffffff80023d0e
 #8 [ffff811457239f60] sys_close at ffffffff8001e13b
 #9 [ffff811457239f80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000035f94c7180  RSP: 00007fff3f9d8828  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 00007fff3f9e1b0c  RSI: 00007fff3f9d9000  RDI: 0000000000000004
    RBP: 0000000000000000   R8: 00007fff3f9e1b0c   R9: 00007fff3f9e1d67
    R10: 30355f3430352e6d  R11: 0000000000000246  R12: 00007fff3f9e1a30
    R13: 0000000000008000  R14: ffffffff8001e13b  R15: ffff81806bb23580
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
crash> bt 17326
PID: 17326  TASK: ffff813e50f0e7a0  CPU: 43  COMMAND: "cp"
 #0 [ffff812a7ccdfcd8] schedule at ffffffff80062fa0
 #1 [ffff812a7ccdfdb0] io_schedule at ffffffff800637de
 #2 [ffff812a7ccdfdd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
 #3 [ffff812a7ccdfde0] __wait_on_bit at ffffffff80063a0a
 #4 [ffff812a7ccdfe20] out_of_line_wait_on_bit at ffffffff80063aa4
 #5 [ffff812a7ccdfe90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
 #6 [ffff812a7ccdff20] nfs_do_fsync at ffffffff88d59ecd [nfs]
 #7 [ffff812a7ccdff40] filp_close at ffffffff80023d0e
 #8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
 #9 [ffff812a7ccdff80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000035f94c7180  RSP: 00007fff0a9f01f8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 00007fff0a9f94dc  RSI: 00007fff0a9f1000  RDI: 0000000000000004
    RBP: 0000000000000000   R8: 00007fff0a9f94dc   R9: 00007fff0a9f9737
    R10: 30355f3930352e6d  R11: 0000000000000246  R12: 00007fff0a9f9400
    R13: 0000000000008000  R14: ffffffff8001e13b  R15: ffff816f5efbfdc0
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b


Running processes - one 'cp' running on pid 22142
crash> ps | grep \> | grep -v swapper
>  1662  31998  20  ffff815fd4506100  RU   0.0  119400  27664  unitool
>  1672  31992   3  ffff81461c848040  RU   0.0  119524  51892  unitool
>  2416  32001   6  ffff81326f036080  RU   0.0  119400  27664  unitool
>  2710   1223  24  ffff81607f75e080  RU   0.0   88496  21700  unitool
>  2785    991  78  ffff817d9a5ff860  RU   0.0  129592  38320  db-unload.impl.
>  2792   1223  12  ffff814e68416100  RU   0.1  572608 502852  unitool
>  5594   4435   0  ffff81807e42b040  RU   0.0       0      0  [khungtaskd]
>  7207      1  33  ffff812006e44100  RU   0.0   35292  14276  perfd
> 10715   4435   2  ffff81207f2d87e0  RU   0.0       0      0  [kjournald]
> 22142  20723  35  ffff811321851080  RU   0.0   69700    788  cp
> 22682  22671  62  ffff812f8fa370c0  RU   0.0  148620  52524  Xab-run
> 23184  23169  74  ffff8139784c4040  RU   0.0  123932  33200  m_db.ora10.gcc4
> 23193   7187  64  ffff813ac61c5820  RU   0.0   66984  13084  Xm_db
> 23196  23195  51  ffff811cebf1b080  RU   0.0   66540  10588  Xm_db
> 23198  23188  44  ffff813e28e2b820  RU   0.0  111408  10628  m_db.ora10.gcc4
crash> bt 22142
PID: 22142  TASK: ffff811321851080  CPU: 35  COMMAND: "cp"
 #0 [ffff8160f0743f20] crash_nmi_callback at ffffffff8007c228
 #1 [ffff8160f0743f40] do_nmi at ffffffff800658e5
 #2 [ffff8160f0743f50] nmi at ffffffff80064ecf
    [exception RIP: copy_user_generic_unrolled+50]
    RIP: ffffffff80061dca  RSP: ffff811a168abbc0  RFLAGS: 00000216
    RAX: ffff811a168abd78  RBX: 0000000000001000  RCX: 000000000000000f
    RDX: 0000000000000000  RSI: 00007fff42400c40  RDI: ffff816539e40c40
    RBP: 0000000000001000   R8: 0c0954d60b4d61a8   R9: 2ce1b743fb0ada62
    R10: c3fb0b182edba4b1  R11: 66416a5b01f7839f  R12: 0000000046faa000
    R13: 0000000000001000  R14: ffff811a168abc58  R15: 0000000000002000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #3 [ffff811a168abbc0] copy_user_generic_unrolled at ffffffff80061dca
 #4 [ffff811a168abbc0] iov_iter_copy_from_user_atomic at ffffffff800c907e
 #5 [ffff811a168abbd0] generic_file_buffered_write at ffffffff8000fef3
 #6 [ffff811a168abcc0] __generic_file_aio_write_nolock at ffffffff80016791
 #7 [ffff811a168abd70] generic_file_aio_write at ffffffff80021944
 #8 [ffff811a168abdc0] nfs_file_write at ffffffff88d5a691 [nfs]
 #9 [ffff811a168abe00] do_sync_write at ffffffff8001841c
#10 [ffff811a168abf10] vfs_write at ffffffff80016b99
#11 [ffff811a168abf40] sys_write at ffffffff80017462
#12 [ffff811a168abf80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000035f94c7270  RSP: 00007fff423fd498  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0000000000008000  RSI: 00007fff423fe000  RDI: 0000000000000004
    RBP: 0000000000000004   R8: 00007fff424067dc   R9: 00007fff42406a37
    R10: 31355f3831352e6d  R11: 0000000000000246  R12: 00007fff423fe000
    R13: 00007fff423fe000  R14: 0000000000008000  R15: 0000000046fb0000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

Get the file * for the above:
crash> print vfs_write
$16 = {ssize_t (struct file *, const char *, size_t, loff_t *)} 0xffffffff80016acb <vfs_write>
crash> bt | grep -A 1 vfs_write
#10 [ffff811a168abf10] vfs_write at ffffffff80016b99
#11 [ffff811a168abf40] sys_write at ffffffff80017462
crash> dis -r ffffffff80017462 | tail -5
0xffffffff80017452 <sys_write+0x35>:    mov    %r13,%rsi
0xffffffff80017455 <sys_write+0x38>:    mov    %rbx,%rdi  <--- 1st param
0xffffffff80017458 <sys_write+0x3b>:    mov    %rax,0x8(%rsp)
0xffffffff8001745d <sys_write+0x40>:    callq  0xffffffff80016acb <vfs_write>
0xffffffff80017462 <sys_write+0x45>:    mov    %rax,%r12
crash> dis vfs_write | head -10
0xffffffff80016acb <vfs_write>: push   %r13
0xffffffff80016acd <vfs_write+0x2>:     mov    %rcx,%r13
0xffffffff80016ad0 <vfs_write+0x5>:     push   %r12
0xffffffff80016ad2 <vfs_write+0x7>:     mov    %rsi,%r12
0xffffffff80016ad5 <vfs_write+0xa>:     mov    %rdx,%rsi
0xffffffff80016ad8 <vfs_write+0xd>:     push   %rbp
0xffffffff80016ad9 <vfs_write+0xe>:     mov    $0xfffffffffffffff7,%rbp
0xffffffff80016ae0 <vfs_write+0x15>:    push   %rbx  <---------- 4th push
0xffffffff80016ae1 <vfs_write+0x16>:    mov    %rdi,%rbx
0xffffffff80016ae4 <vfs_write+0x19>:    sub    $0x8,%rsp

crash> bt -f | grep -B 3 sys_write
    ffff811a168abf18: c000003e00000001 ffff817dced353c0 
    ffff811a168abf28: 0000000000008000 fffffffffffffff7 
    ffff811a168abf38: 00007fff423fe000 ffffffff80017462 
#11 [ffff811a168abf40] sys_write at ffffffff80017462
crash> file 0xffff817dced353c0 | grep dentry
  f_dentry = 0xffff817d380e1660, 
crash> files -d 0xffff817d380e1660
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff817d380e1660 ffff8165ceb8d7e0 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.518_518.dat.gz


Obtain file * for 27324

crash> print filp_close
$12 = {int (struct file *, fl_owner_t)} 0xffffffff80023cd8 <filp_close>
crash> dis -r ffffffff8001e13b | tail -7
0xffffffff8001e11f <sys_close+0x6c>:    mov    %ebp,%esi
0xffffffff8001e121 <sys_close+0x6e>:    callq  0xffffffff800304a3 <__put_unused_fd>
0xffffffff8001e126 <sys_close+0x73>:    movl   $0x1,0x80(%rbx)
0xffffffff8001e130 <sys_close+0x7d>:    mov    %rbx,%rsi
0xffffffff8001e133 <sys_close+0x80>:    mov    %r12,%rdi
0xffffffff8001e136 <sys_close+0x83>:    callq  0xffffffff80023cd8 <filp_close>
0xffffffff8001e13b <sys_close+0x88>:    mov    %eax,%edx
crash> dis filp_close | head -10
0xffffffff80023cd8 <filp_close>:        push   %r12
crash> bt -f | grep -B 1 sys_close
    ffff811457239f58: ffff81806bb23580 ffffffff8001e13b 
 #8 [ffff811457239f60] sys_close at ffffffff8001e13b
crash> file 0xffff81806bb23580 | grep dentry
  f_dentry = 0xffff816b5a9deb70, 
crash> files -d 0xffff816b5a9deb70
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff816b5a9deb70 ffff8168a585d298 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.504_504.dat.gz

Now the other task
crash> set 17326
    PID: 17326
COMMAND: "cp"
   TASK: ffff813e50f0e7a0  [THREAD_INFO: ffff812a7ccde000]
    CPU: 43
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 17326  TASK: ffff813e50f0e7a0  CPU: 43  COMMAND: "cp"
 #0 [ffff812a7ccdfcd8] schedule at ffffffff80062fa0
 #1 [ffff812a7ccdfdb0] io_schedule at ffffffff800637de
 #2 [ffff812a7ccdfdd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
 #3 [ffff812a7ccdfde0] __wait_on_bit at ffffffff80063a0a
 #4 [ffff812a7ccdfe20] out_of_line_wait_on_bit at ffffffff80063aa4
 #5 [ffff812a7ccdfe90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
 #6 [ffff812a7ccdff20] nfs_do_fsync at ffffffff88d59ecd [nfs]
 #7 [ffff812a7ccdff40] filp_close at ffffffff80023d0e
 #8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
 #9 [ffff812a7ccdff80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 00000035f94c7180  RSP: 00007fff0a9f01f8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 00007fff0a9f94dc  RSI: 00007fff0a9f1000  RDI: 0000000000000004
    RBP: 0000000000000000   R8: 00007fff0a9f94dc   R9: 00007fff0a9f9737
    R10: 30355f3930352e6d  R11: 0000000000000246  R12: 00007fff0a9f9400
    R13: 0000000000008000  R14: ffffffff8001e13b  R15: ffff816f5efbfdc0
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
crash> bt -f | grep -B 1 sys_close
    ffff812a7ccdff58: ffff816f5efbfdc0 ffffffff8001e13b 
 #8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
crash> file 0xffff816f5efbfdc0 | grep dentry
  f_dentry = 0xffff8138b981d3d8, 
crash> files -d 0xffff8138b981d3d8
     DENTRY           INODE           SUPERBLK     TYPE PATH
ffff8138b981d3d8 ffff817d1561a6e0 ffff8180786b0c00 REG  prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.509_509.dat.gz

Let's get the mount point for the superblock which shows up in the above 3 files:
crash> mount | grep ffff8180786b0c00
ffff81807fe50ac0 ffff8180786b0c00 nfs    hds2nas3:/mnt/aideliverables /aideliverables
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.