'cp' over an NFS share hangs and triggers hung_task_timeout on Red Hat Enterprise Linux 5.8
Environment
- Red Hat Enterprise Linux 5.8
- kernel 2.6.18-308.24.1.el5
- NFSv3 client
- Veritas cluster
Issue
- Periodically,
cpbecomes blocked and triggershung_task_timeout. - The backtrace seen in messages file looks like this
INFO: task cp:27324 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cp D ffffffff80157271 0 27324 20865 (NOTLB)
ffff811457239da8 0000000000000082 0000000000000001 ffffffff8033f0e8
0000000000000000 0000000000000007 ffff81154e4f2860 ffff81407f8bf080
00005dd4b605f5d5 000000000000049f ffff81154e4f2a48 000000491dc5baf8
Call Trace:
[<ffffffff8006ece7>] do_gettimeofday+0x40/0x90
[<ffffffff88d5f4c9>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff800637de>] io_schedule+0x3f/0x67
[<ffffffff88d5f4d2>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd
[<ffffffff80063a0a>] __wait_on_bit+0x40/0x6e
[<ffffffff88d5f4c9>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd
[<ffffffff80063aa4>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff800a34f9>] wake_bit_function+0x0/0x23
[<ffffffff88d64218>] :nfs:nfs_sync_inode_wait+0xeb/0x2df
[<ffffffff88d59ecd>] :nfs:nfs_do_fsync+0x22/0x42
[<ffffffff80023d0e>] filp_close+0x36/0x64
[<ffffffff8001e13b>] sys_close+0x88/0xbd
Resolution
Collect a vmcore as mentioned in the "Diagnostic Steps" section and engage Red Hat support to analyse it.
Diagnostic Steps
- Take a vmcore at the time of the hang, either triggered with
hung_task_panic = 1orsysrq-c - Analyze the vmcore
This vmcore is similar to before, but not identical. It is a hung task panic, with 'cp' causing the hung task timer to fire.
The files involved all look like:
/aideliverables/prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.*
These are on the following NFS mount:
nfs hds2nas3:/mnt/aideliverables /aideliverables
Here's the interesting processes so far and the files being operated on:
27324: cp process triggering the hung task panic; doing 'close' system call but stuck inside nfs_sync_inode_wait
crash> file 0xffff81806bb23580 | grep dentry
f_dentry = 0xffff816b5a9deb70,
crash> files -d 0xffff816b5a9deb70
DENTRY INODE SUPERBLK TYPE PATH
ffff816b5a9deb70 ffff8168a585d298 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.504_504.dat.gz
17326: cp process blocked almost as long as 27324, same backtrace as 27324; operating on a file in the same path but with a different name
crash> file 0xffff816f5efbfdc0 | grep dentry
f_dentry = 0xffff8138b981d3d8,
crash> files -d 0xffff8138b981d3d8
DENTRY INODE SUPERBLK TYPE PATH
ffff8138b981d3d8 ffff817d1561a6e0 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.509_509.dat.gz
22142: running 'cp' process; copying data from userspace to kernel for a write to a file of similar path, but different name from the above two.
crash> file 0xffff817dced353c0 | grep dentry
f_dentry = 0xffff817d380e1660,
crash> files -d 0xffff817d380e1660
DENTRY INODE SUPERBLK TYPE PATH
ffff817d380e1660 ffff8165ceb8d7e0 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.518_518.dat.gz
All of the above are on the same superblock, so here is the mount:
crash> mount | grep ffff8180786b0c00
ffff81807fe50ac0 ffff8180786b0c00 nfs hds2nas3:/mnt/aideliverables /aideliverables
Details
=======
crash> sys
KERNEL: /cores/20130603141348/work/vmlinux
DUMPFILE: /cores/20130603141348/work/04457217.prod8.vmcore [PARTIAL DUMP]
CPUS: 80
DATE: Sun Jun 2 12:53:32 2013
UPTIME: 1 days, 04:42:13
LOAD AVERAGE: 34.53, 33.04, 27.21
TASKS: 2734
NODENAME: www.example.com
RELEASE: 2.6.18-308.24.1.el5
VERSION: #1 SMP Wed Nov 21 11:42:14 EST 2012
MACHINE: x86_64 (2394 Mhz)
MEMORY: 505 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
[dwysocha@megatron work]$ grep UN ps_dtime.txt
[ 0 00:00:00.000] UN+[UN] PID: 5596 TASK: ffff81407f0ee0c0 CPU: 41 COMMAND: "pdflush"
[ 0 00:00:00.000] UN+[UN] PID: 14072 TASK: ffff8180788d37a0 CPU: 42 COMMAND: "syslogd"
[ 0 00:00:00.015] UN+[UN] PID: 29211 TASK: ffff8149d55dd100 CPU: 13 COMMAND: "updatedb"
[ 0 00:00:02.650] UN+[UN] PID: 17954 TASK: ffff816cc10e1820 CPU: 6 COMMAND: "cp"
[ 0 00:00:03.115] UN+[UN] PID: 2942 TASK: ffff8172376c27e0 CPU: 27 COMMAND: "cp"
[ 0 00:00:06.481] UN+[UN] PID: 30081 TASK: ffff811823cfc100 CPU: 16 COMMAND: "cp"
[ 0 00:00:09.627] UN+[UN] PID: 21108 TASK: ffff815ed44bb7e0 CPU: 7 COMMAND: "cp"
[ 0 00:00:13.934] UN+[UN] PID: 17304 TASK: ffff812e11b2d860 CPU: 2 COMMAND: "cp"
[ 0 00:01:39.992] UN+[UN] PID: 22878 TASK: ffff812e8bcb3080 CPU: 72 COMMAND: "cp"
[ 0 00:02:17.066] UN+[UN] PID: 17326 TASK: ffff813e50f0e7a0 CPU: 43 COMMAND: "cp"
[ 0 00:02:38.311] UN+[UN] PID: 27324 TASK: ffff81154e4f2860 CPU: 73 COMMAND: "cp" <------------ caused the hung task timer to fire
crash> p sysctl_hung_task_timeout_secs
sysctl_hung_task_timeout_secs = $2 = 120
crash> p sysctl_hung_task_panic
sysctl_hung_task_panic = $3 = 1
crash> bt 27324
PID: 27324 TASK: ffff81154e4f2860 CPU: 73 COMMAND: "cp"
#0 [ffff811457239cd8] schedule at ffffffff80062fa0
#1 [ffff811457239db0] io_schedule at ffffffff800637de
#2 [ffff811457239dd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
#3 [ffff811457239de0] __wait_on_bit at ffffffff80063a0a
#4 [ffff811457239e20] out_of_line_wait_on_bit at ffffffff80063aa4
#5 [ffff811457239e90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
#6 [ffff811457239f20] nfs_do_fsync at ffffffff88d59ecd [nfs]
#7 [ffff811457239f40] filp_close at ffffffff80023d0e
#8 [ffff811457239f60] sys_close at ffffffff8001e13b
#9 [ffff811457239f80] tracesys at ffffffff8005d28d (via system_call)
RIP: 00000035f94c7180 RSP: 00007fff3f9d8828 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff
RDX: 00007fff3f9e1b0c RSI: 00007fff3f9d9000 RDI: 0000000000000004
RBP: 0000000000000000 R8: 00007fff3f9e1b0c R9: 00007fff3f9e1d67
R10: 30355f3430352e6d R11: 0000000000000246 R12: 00007fff3f9e1a30
R13: 0000000000008000 R14: ffffffff8001e13b R15: ffff81806bb23580
ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
crash> bt 17326
PID: 17326 TASK: ffff813e50f0e7a0 CPU: 43 COMMAND: "cp"
#0 [ffff812a7ccdfcd8] schedule at ffffffff80062fa0
#1 [ffff812a7ccdfdb0] io_schedule at ffffffff800637de
#2 [ffff812a7ccdfdd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
#3 [ffff812a7ccdfde0] __wait_on_bit at ffffffff80063a0a
#4 [ffff812a7ccdfe20] out_of_line_wait_on_bit at ffffffff80063aa4
#5 [ffff812a7ccdfe90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
#6 [ffff812a7ccdff20] nfs_do_fsync at ffffffff88d59ecd [nfs]
#7 [ffff812a7ccdff40] filp_close at ffffffff80023d0e
#8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
#9 [ffff812a7ccdff80] tracesys at ffffffff8005d28d (via system_call)
RIP: 00000035f94c7180 RSP: 00007fff0a9f01f8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff
RDX: 00007fff0a9f94dc RSI: 00007fff0a9f1000 RDI: 0000000000000004
RBP: 0000000000000000 R8: 00007fff0a9f94dc R9: 00007fff0a9f9737
R10: 30355f3930352e6d R11: 0000000000000246 R12: 00007fff0a9f9400
R13: 0000000000008000 R14: ffffffff8001e13b R15: ffff816f5efbfdc0
ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
Running processes - one 'cp' running on pid 22142
crash> ps | grep \> | grep -v swapper
> 1662 31998 20 ffff815fd4506100 RU 0.0 119400 27664 unitool
> 1672 31992 3 ffff81461c848040 RU 0.0 119524 51892 unitool
> 2416 32001 6 ffff81326f036080 RU 0.0 119400 27664 unitool
> 2710 1223 24 ffff81607f75e080 RU 0.0 88496 21700 unitool
> 2785 991 78 ffff817d9a5ff860 RU 0.0 129592 38320 db-unload.impl.
> 2792 1223 12 ffff814e68416100 RU 0.1 572608 502852 unitool
> 5594 4435 0 ffff81807e42b040 RU 0.0 0 0 [khungtaskd]
> 7207 1 33 ffff812006e44100 RU 0.0 35292 14276 perfd
> 10715 4435 2 ffff81207f2d87e0 RU 0.0 0 0 [kjournald]
> 22142 20723 35 ffff811321851080 RU 0.0 69700 788 cp
> 22682 22671 62 ffff812f8fa370c0 RU 0.0 148620 52524 Xab-run
> 23184 23169 74 ffff8139784c4040 RU 0.0 123932 33200 m_db.ora10.gcc4
> 23193 7187 64 ffff813ac61c5820 RU 0.0 66984 13084 Xm_db
> 23196 23195 51 ffff811cebf1b080 RU 0.0 66540 10588 Xm_db
> 23198 23188 44 ffff813e28e2b820 RU 0.0 111408 10628 m_db.ora10.gcc4
crash> bt 22142
PID: 22142 TASK: ffff811321851080 CPU: 35 COMMAND: "cp"
#0 [ffff8160f0743f20] crash_nmi_callback at ffffffff8007c228
#1 [ffff8160f0743f40] do_nmi at ffffffff800658e5
#2 [ffff8160f0743f50] nmi at ffffffff80064ecf
[exception RIP: copy_user_generic_unrolled+50]
RIP: ffffffff80061dca RSP: ffff811a168abbc0 RFLAGS: 00000216
RAX: ffff811a168abd78 RBX: 0000000000001000 RCX: 000000000000000f
RDX: 0000000000000000 RSI: 00007fff42400c40 RDI: ffff816539e40c40
RBP: 0000000000001000 R8: 0c0954d60b4d61a8 R9: 2ce1b743fb0ada62
R10: c3fb0b182edba4b1 R11: 66416a5b01f7839f R12: 0000000046faa000
R13: 0000000000001000 R14: ffff811a168abc58 R15: 0000000000002000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#3 [ffff811a168abbc0] copy_user_generic_unrolled at ffffffff80061dca
#4 [ffff811a168abbc0] iov_iter_copy_from_user_atomic at ffffffff800c907e
#5 [ffff811a168abbd0] generic_file_buffered_write at ffffffff8000fef3
#6 [ffff811a168abcc0] __generic_file_aio_write_nolock at ffffffff80016791
#7 [ffff811a168abd70] generic_file_aio_write at ffffffff80021944
#8 [ffff811a168abdc0] nfs_file_write at ffffffff88d5a691 [nfs]
#9 [ffff811a168abe00] do_sync_write at ffffffff8001841c
#10 [ffff811a168abf10] vfs_write at ffffffff80016b99
#11 [ffff811a168abf40] sys_write at ffffffff80017462
#12 [ffff811a168abf80] tracesys at ffffffff8005d28d (via system_call)
RIP: 00000035f94c7270 RSP: 00007fff423fd498 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff
RDX: 0000000000008000 RSI: 00007fff423fe000 RDI: 0000000000000004
RBP: 0000000000000004 R8: 00007fff424067dc R9: 00007fff42406a37
R10: 31355f3831352e6d R11: 0000000000000246 R12: 00007fff423fe000
R13: 00007fff423fe000 R14: 0000000000008000 R15: 0000000046fb0000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
Get the file * for the above:
crash> print vfs_write
$16 = {ssize_t (struct file *, const char *, size_t, loff_t *)} 0xffffffff80016acb <vfs_write>
crash> bt | grep -A 1 vfs_write
#10 [ffff811a168abf10] vfs_write at ffffffff80016b99
#11 [ffff811a168abf40] sys_write at ffffffff80017462
crash> dis -r ffffffff80017462 | tail -5
0xffffffff80017452 <sys_write+0x35>: mov %r13,%rsi
0xffffffff80017455 <sys_write+0x38>: mov %rbx,%rdi <--- 1st param
0xffffffff80017458 <sys_write+0x3b>: mov %rax,0x8(%rsp)
0xffffffff8001745d <sys_write+0x40>: callq 0xffffffff80016acb <vfs_write>
0xffffffff80017462 <sys_write+0x45>: mov %rax,%r12
crash> dis vfs_write | head -10
0xffffffff80016acb <vfs_write>: push %r13
0xffffffff80016acd <vfs_write+0x2>: mov %rcx,%r13
0xffffffff80016ad0 <vfs_write+0x5>: push %r12
0xffffffff80016ad2 <vfs_write+0x7>: mov %rsi,%r12
0xffffffff80016ad5 <vfs_write+0xa>: mov %rdx,%rsi
0xffffffff80016ad8 <vfs_write+0xd>: push %rbp
0xffffffff80016ad9 <vfs_write+0xe>: mov $0xfffffffffffffff7,%rbp
0xffffffff80016ae0 <vfs_write+0x15>: push %rbx <---------- 4th push
0xffffffff80016ae1 <vfs_write+0x16>: mov %rdi,%rbx
0xffffffff80016ae4 <vfs_write+0x19>: sub $0x8,%rsp
crash> bt -f | grep -B 3 sys_write
ffff811a168abf18: c000003e00000001 ffff817dced353c0
ffff811a168abf28: 0000000000008000 fffffffffffffff7
ffff811a168abf38: 00007fff423fe000 ffffffff80017462
#11 [ffff811a168abf40] sys_write at ffffffff80017462
crash> file 0xffff817dced353c0 | grep dentry
f_dentry = 0xffff817d380e1660,
crash> files -d 0xffff817d380e1660
DENTRY INODE SUPERBLK TYPE PATH
ffff817d380e1660 ffff8165ceb8d7e0 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.518_518.dat.gz
Obtain file * for 27324
crash> print filp_close
$12 = {int (struct file *, fl_owner_t)} 0xffffffff80023cd8 <filp_close>
crash> dis -r ffffffff8001e13b | tail -7
0xffffffff8001e11f <sys_close+0x6c>: mov %ebp,%esi
0xffffffff8001e121 <sys_close+0x6e>: callq 0xffffffff800304a3 <__put_unused_fd>
0xffffffff8001e126 <sys_close+0x73>: movl $0x1,0x80(%rbx)
0xffffffff8001e130 <sys_close+0x7d>: mov %rbx,%rsi
0xffffffff8001e133 <sys_close+0x80>: mov %r12,%rdi
0xffffffff8001e136 <sys_close+0x83>: callq 0xffffffff80023cd8 <filp_close>
0xffffffff8001e13b <sys_close+0x88>: mov %eax,%edx
crash> dis filp_close | head -10
0xffffffff80023cd8 <filp_close>: push %r12
crash> bt -f | grep -B 1 sys_close
ffff811457239f58: ffff81806bb23580 ffffffff8001e13b
#8 [ffff811457239f60] sys_close at ffffffff8001e13b
crash> file 0xffff81806bb23580 | grep dentry
f_dentry = 0xffff816b5a9deb70,
crash> files -d 0xffff816b5a9deb70
DENTRY INODE SUPERBLK TYPE PATH
ffff816b5a9deb70 ffff8168a585d298 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.504_504.dat.gz
Now the other task
crash> set 17326
PID: 17326
COMMAND: "cp"
TASK: ffff813e50f0e7a0 [THREAD_INFO: ffff812a7ccde000]
CPU: 43
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 17326 TASK: ffff813e50f0e7a0 CPU: 43 COMMAND: "cp"
#0 [ffff812a7ccdfcd8] schedule at ffffffff80062fa0
#1 [ffff812a7ccdfdb0] io_schedule at ffffffff800637de
#2 [ffff812a7ccdfdd0] nfs_wait_bit_uninterruptible at ffffffff88d5f4d2 [nfs]
#3 [ffff812a7ccdfde0] __wait_on_bit at ffffffff80063a0a
#4 [ffff812a7ccdfe20] out_of_line_wait_on_bit at ffffffff80063aa4
#5 [ffff812a7ccdfe90] nfs_sync_inode_wait at ffffffff88d64218 [nfs]
#6 [ffff812a7ccdff20] nfs_do_fsync at ffffffff88d59ecd [nfs]
#7 [ffff812a7ccdff40] filp_close at ffffffff80023d0e
#8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
#9 [ffff812a7ccdff80] tracesys at ffffffff8005d28d (via system_call)
RIP: 00000035f94c7180 RSP: 00007fff0a9f01f8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff
RDX: 00007fff0a9f94dc RSI: 00007fff0a9f1000 RDI: 0000000000000004
RBP: 0000000000000000 R8: 00007fff0a9f94dc R9: 00007fff0a9f9737
R10: 30355f3930352e6d R11: 0000000000000246 R12: 00007fff0a9f9400
R13: 0000000000008000 R14: ffffffff8001e13b R15: ffff816f5efbfdc0
ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b
crash> bt -f | grep -B 1 sys_close
ffff812a7ccdff58: ffff816f5efbfdc0 ffffffff8001e13b
#8 [ffff812a7ccdff60] sys_close at ffffffff8001e13b
crash> file 0xffff816f5efbfdc0 | grep dentry
f_dentry = 0xffff8138b981d3d8,
crash> files -d 0xffff8138b981d3d8
DENTRY INODE SUPERBLK TYPE PATH
ffff8138b981d3d8 ffff817d1561a6e0 ffff8180786b0c00 REG prod/SybaseIQ/PUBLISH/tec/mkt/wkr/tec.mkt.wkr.fact_data_item.509_509.dat.gz
Let's get the mount point for the superblock which shows up in the above 3 files:
crash> mount | grep ffff8180786b0c00
ffff81807fe50ac0 ffff8180786b0c00 nfs hds2nas3:/mnt/aideliverables /aideliverables
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.