Server crashed with hung task panic trigger after "jbd2/dm-9-8:762" was blocked for more than 120 seconds.
Environment
- Red Hat Enterprise Linux 5.6
Issue
- Server crashed with hung task panic trigger after "jbd2/dm-9-8:762" was blocked for more than 120 seconds.
crash> log
INFO: task jbd2/dm-9-8:762 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-9-8 D ffff81043e0a1420 0 762 585 763 748 (L-TLB)
ffff81041daa7d60 0000000000000046 ffff81041daa7d00 ffffffff8008cf5f
0000000000000001 000000000000000a ffff81102aa04080 ffff810c3e1197e0
000123b39f5946b9 0000000000000bab ffff81102aa04268 0000000300000000
Call Trace:
[<ffffffff8008cf5f>] find_busiest_group+0x20d/0x621
[<ffffffff88582f6d>] :jbd2:jbd2_journal_commit_transaction+0x191/0x1068
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff8003dbe6>] lock_timer_base+0x1b/0x3c
[<ffffffff8004b23a>] try_to_del_timer_sync+0x7f/0x88
[<ffffffff88586f7c>] :jbd2:kjournald2+0x9a/0x1ec
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff88586ee2>] :jbd2:kjournald2+0x0/0x1ec
[<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032974>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032876>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Kernel panic - not syncing: hung_task: blocked tasks
Resolution
Don't enable the hung_task_panic setting. Given a chance the fsync() operation and journal commit operations will eventually complete and any blocked tasks will be allowed to run again.
Optionally consider calling fsync() more frequently to reduce the number of outstanding dirty pages and therefore reduce the time taken by any single fsync() operation. Also consider analysing the I/O access pattern of large files to ensure they are written serially or are allocated physically contiguous on disk.
Root Cause
A process doing an fsync() operation on a large sparsely populated file is taking a long time to flush all the dirty pages to disk. During this time the fsync() operation holds a reference to the currently running transaction which holds up the journal task from committing the current transaction. Any other tasks needing to start a new transaction will need to wait until the journal task commits the current transaction and therefore need to also wait for the fsync() operation to complete.
Diagnostic Steps
-
Server rebooted because of hung task panic trigger.
kernel.hung_task_panic = 1
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.18-238.el5/vmlinux
DUMPFILE: vmcore
CPUS: 16
DATE: Mon Nov 11 23:51:14 2013
UPTIME: 3 days, 17:12:04
LOAD AVERAGE: 4.83, 2.75, 1.82 <------ There was not much load on server
TASKS: 587
RELEASE: 2.6.18-238.el5
VERSION: #1 SMP Sun Dec 19 14:22:44 EST 2010
MACHINE: x86_64 (2400 Mhz)
MEMORY: 63.1 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
PID: 996
COMMAND: "khungtaskd"
TASK: ffff81042f9a9860 [THREAD_INFO: ffff81082f510000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
- Ring Buffer
crash> log
INFO: task jbd2/dm-9-8:762 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-9-8 D ffff81043e0a1420 0 762 585 763 748 (L-TLB)
ffff81041daa7d60 0000000000000046 ffff81041daa7d00 ffffffff8008cf5f
0000000000000001 000000000000000a ffff81102aa04080 ffff810c3e1197e0
000123b39f5946b9 0000000000000bab ffff81102aa04268 0000000300000000
Call Trace:
[<ffffffff8008cf5f>] find_busiest_group+0x20d/0x621
[<ffffffff88582f6d>] :jbd2:jbd2_journal_commit_transaction+0x191/0x1068
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff8003dbe6>] lock_timer_base+0x1b/0x3c
[<ffffffff8004b23a>] try_to_del_timer_sync+0x7f/0x88
[<ffffffff88586f7c>] :jbd2:kjournald2+0x9a/0x1ec
[<ffffffff800a28b4>] autoremove_wake_function+0x0/0x2e
[<ffffffff88586ee2>] :jbd2:kjournald2+0x0/0x1ec
[<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032974>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032876>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Kernel panic - not syncing: hung_task: blocked tasks
- Backtraces
crash> !grep UN ps_udt.txt
[ 0 00:00:00.002] UN+[UN] PID: 2495 TASK: ffff81042fa117a0 CPU: 10 COMMAND: "klogd"
[ 0 00:00:00.002] UN+[UN] PID: 647 TASK: ffff81102d154040 CPU: 8 COMMAND: "kjournald"
[ 0 00:00:00.002] UN+[UN] PID: 2489 TASK: ffff81042f959080 CPU: 9 COMMAND: "syslogd"
[ 0 00:00:00.004] UN+[UN] PID: 23364 TASK: ffff8105c6a75040 CPU: 8 COMMAND: "ibbackup"
[ 0 00:02:03.805] UN+[UN] PID: 14715 TASK: ffff8110139f2040 CPU: 2 COMMAND: "mysqld"
[ 0 00:02:04.189] UN+[UN] PID: 28414 TASK: ffff810c7ed88080 CPU: 5 COMMAND: "pdflush"
[ 0 00:02:04.435] UN+[UN] PID: 762 TASK: ffff81102aa04080 CPU: 3 COMMAND: "jbd2/dm-9-8"
[ 0 00:02:04.435] UN+[UN] PID: 14705 TASK: ffff8106b0c370c0 CPU: 3 COMMAND: "mysqld"
Task states: RU ( 17), IN( 562), UN( 8), ZO( 0), ST( 0), DE( 0), SW( 0)
[RU] = Running [IN] = Interruptible [UN] = *Uninterruptible*
There are 4 processes which are pending since more than 120 seconds. All four have ext4/jbd2 traces:
crash> bt 14705
PID: 14705 TASK: ffff8106b0c370c0 CPU: 3 COMMAND: "mysqld"
#0 [ffff8106bfac7db8] schedule at ffffffff80062f8e
#1 [ffff8106bfac7e90] jbd2_log_wait_commit at ffffffff88586568 [jbd2]
#2 [ffff8106bfac7ef0] ext4_sync_file at ffffffff8859bc6c [ext4]
#3 [ffff8106bfac7f30] do_fsync at ffffffff80050310
#4 [ffff8106bfac7f60] __do_fsync at ffffffff800e33fe
#5 [ffff8106bfac7f80] system_call at ffffffff8005d116
RIP: 000000307240e047 RSP: 0000000048a2d088 RFLAGS: 00010206
RAX: 000000000000004a RBX: ffffffff8005d116 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000009
RBP: 0000000000000009 R8: 000000007ff00000 R9: 000000007feff800
R10: 000000006e818200 R11: 0000000000000206 R12: 0000000048a2cec0
R13: 0000000000000000 R14: ffffffff800e33fe R15: 0000000051eb851f
ORIG_RAX: 000000000000004a CS: 0033 SS: 002b
crash> bt 762
PID: 762 TASK: ffff81102aa04080 CPU: 3 COMMAND: "jbd2/dm-9-8"
#0 [ffff81041daa7c90] schedule at ffffffff80062f8e
#1 [ffff81041daa7d68] jbd2_journal_commit_transaction at ffffffff88582f6d [jbd2]
#2 [ffff81041daa7e78] kjournald2 at ffffffff88586f7c [jbd2]
#3 [ffff81041daa7ee8] kthread at ffffffff80032974
#4 [ffff81041daa7f48] kernel_thread at ffffffff8005dfb1
crash> bt 28414
PID: 28414 TASK: ffff810c7ed88080 CPU: 5 COMMAND: "pdflush"
#0 [ffff810b72183aa0] schedule at ffffffff80062f8e
#1 [ffff810b72183b78] start_this_handle at ffffffff88581c9e [jbd2]
#2 [ffff810b72183bf8] jbd2_journal_start at ffffffff88581e09 [jbd2]
#3 [ffff810b72183c28] ext4_da_writepages at ffffffff885a381c [ext4]
#4 [ffff810b72183d08] do_writepages at ffffffff8005ac1d
#5 [ffff810b72183d18] __writeback_single_inode at ffffffff8002fc9d
#6 [ffff810b72183da8] sync_sb_inodes at ffffffff80020fd6
#7 [ffff810b72183de8] writeback_inodes at ffffffff80051147
#8 [ffff810b72183e08] wb_kupdate at ffffffff800cbb98
#9 [ffff810b72183e78] pdflush at ffffffff80056783
#10 [ffff810b72183ee8] kthread at ffffffff80032974
#11 [ffff810b72183f48] kernel_thread at ffffffff8005dfb1
crash> bt 14715
PID: 14715 TASK: ffff8110139f2040 CPU: 2 COMMAND: "mysqld"
#0 [ffff810e0e715b08] schedule at ffffffff80062f8e
#1 [ffff810e0e715be0] start_this_handle at ffffffff88581c9e [jbd2]
#2 [ffff810e0e715c60] jbd2_journal_start at ffffffff88581e09 [jbd2]
#3 [ffff810e0e715c90] ext4_dirty_inode at ffffffff885a40d8 [ext4]
#4 [ffff810e0e715cb0] __mark_inode_dirty at ffffffff80013c94
#5 [ffff810e0e715ce0] __generic_file_aio_write_nolock at ffffffff8001659a
#6 [ffff810e0e715d80] generic_file_aio_write at ffffffff80021850
#7 [ffff810e0e715dd0] ext4_file_write at ffffffff8859ba1c [ext4]
#8 [ffff810e0e715e00] do_sync_write at ffffffff800182df
#9 [ffff810e0e715f10] vfs_write at ffffffff80016a81
#10 [ffff810e0e715f40] sys_write at ffffffff80017339
#11 [ffff810e0e715f80] system_call at ffffffff8005d116
The task reported as being blocked is:
crash> set 762
PID: 762
COMMAND: "jbd2/dm-9-8"
TASK: ffff81102aa04080 [THREAD_INFO: ffff81041daa6000]
CPU: 3
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 762 TASK: ffff81102aa04080 CPU: 3 COMMAND: "jbd2/dm-9-8"
#0 [ffff81041daa7c90] schedule at ffffffff80062f8e
#1 [ffff81041daa7d68] jbd2_journal_commit_transaction at ffffffff88582f6d [jbd2]
#2 [ffff81041daa7e78] kjournald2 at ffffffff88586f7c [jbd2]
#3 [ffff81041daa7ee8] kthread at ffffffff80032974
#4 [ffff81041daa7f48] kernel_thread at ffffffff8005dfb1
It's in this code in jbd2_journal_commit_transaction():
while (commit_transaction->t_updates) {
DEFINE_WAIT(wait);
prepare_to_wait(&journal->j_wait_updates, &wait,
TASK_UNINTERRUPTIBLE);
if (commit_transaction->t_updates) {
spin_unlock(&commit_transaction->t_handle_lock);
spin_unlock(&journal->j_state_lock);
schedule();
spin_lock(&journal->j_state_lock);
spin_lock(&commit_transaction->t_handle_lock);
}
finish_wait(&journal->j_wait_updates, &wait);
}
This is a journal task committing the current running transaction but is held up because the t_updates reference counter on the transaction is non-zero indicating other tasks are still using the transaction.
So we need to find any tasks that could be using that transaction and see what's holding them up.
Looking at the current blocked tasks:
crash> ps | grep UN
647 585 8 ffff81102d154040 UN 0.0 0 0 [kjournald]
762 585 3 ffff81102aa04080 UN 0.0 0 0 [jbd2/dm-9-8]
2489 1 9 ffff81042f959080 UN 0.0 5932 708 syslogd
2495 1 10 ffff81042fa117a0 UN 0.0 3828 416 klogd
14705 8348 3 ffff8106b0c370c0 UN 75.8 51852436 51453640 mysqld
14715 8348 2 ffff8110139f2040 UN 75.8 51852436 51453640 mysqld
23364 27233 8 ffff8105c6a75040 UN 0.0 12104 5356 ibbackup
28414 585 5 ffff810c7ed88080 UN 0.0 0 0 [pdflush]
This one looks likely:
crash> set 23364
PID: 23364
COMMAND: "ibbackup"
TASK: ffff8105c6a75040 [THREAD_INFO: ffff810750ed0000]
CPU: 8
STATE: TASK_UNINTERRUPTIBLE
crash> bt
PID: 23364 TASK: ffff8105c6a75040 CPU: 8 COMMAND: "ibbackup"
#0 [ffff810750ed1548] schedule at ffffffff80062f8e
#1 [ffff810750ed1620] io_schedule at ffffffff800637ca
#2 [ffff810750ed1640] sync_buffer at ffffffff800154ed
#3 [ffff810750ed1650] __wait_on_bit at ffffffff800639f6
#4 [ffff810750ed1690] out_of_line_wait_on_bit at ffffffff80063a90
#5 [ffff810750ed1700] ext4_mb_init_cache at ffffffff885b89cb [ext4]
#6 [ffff810750ed17c0] ext4_mb_load_buddy at ffffffff885b9399 [ext4]
#7 [ffff810750ed1810] ext4_mb_regular_allocator at ffffffff885b9df2 [ext4]
#8 [ffff810750ed18c0] ext4_mb_new_blocks at ffffffff885bc964 [ext4]
#9 [ffff810750ed1930] ext4_ext_get_blocks at ffffffff885b4c19 [ext4]
#10 [ffff810750ed1ac0] ext4_get_blocks at ffffffff8859fe97 [ext4]
#11 [ffff810750ed1b10] mpage_da_map_blocks at ffffffff885a0057 [ext4]
#12 [ffff810750ed1c60] mpage_add_bh_to_extent at ffffffff885a06c5 [ext4]
#13 [ffff810750ed1c90] __mpage_da_writepage at ffffffff885a217e [ext4]
#14 [ffff810750ed1cc0] write_cache_pages at ffffffff800f61e7
#15 [ffff810750ed1de0] ext4_da_writepages at ffffffff885a389a [ext4]
#16 [ffff810750ed1ec0] do_writepages at ffffffff8005ac1d
#17 [ffff810750ed1ed0] __filemap_fdatawrite_range at ffffffff8004fbe4
#18 [ffff810750ed1f30] do_fsync at ffffffff800502ed
#19 [ffff810750ed1f60] __do_fsync at ffffffff800e33fe
#20 [ffff810750ed1f80] system_call at ffffffff8005d116
RIP: 00000030718cd620 RSP: 00007fff39d218e0 RFLAGS: 00000246
RAX: 000000000000004a RBX: ffffffff8005d116 RCX: 00000030718c46b3
RDX: 0000000000001900 RSI: 0000000007ce7d28 RDI: 0000000000000006
RBP: 0000000051eb851f R8: 0000000000000028 R9: 0000000000100000
R10: 000000288b700000 R11: 0000000000000246 R12: 0000000000000006
R13: 0000000000000000 R14: ffffffff800e33fe R15: 0000000000040000
ORIG_RAX: 000000000000004a CS: 0033 SS: 002b
Pulling the file pointer off the stack and comparing the block device with the journal's filesystem block device:
crash> struct file.f_vfsmnt ffff81082d6472c0
f_vfsmnt = 0xffff81010eb0b380
crash> vfsmount.mnt_sb 0xffff81010eb0b380
mnt_sb = 0xffff81042e510800
crash> super_block.s_bdev 0xffff81042e510800
s_bdev = 0xffff81042e3710c0
crash> journal_t.j_dev,j_fs_dev ffff81042d1cb000
j_dev = 0xffff81042e3710c0
j_fs_dev = 0xffff81042e3710c0
That's a match so this task is operating on the same filesystem.
From the stack above we can see that ext4_da_writepages() has called write_cache_pages() and this is after starting a transaction.
Pulling the transaction handle off the stack:
crash> handle_t.h_transaction ffff81053ff0efd8
h_transaction = 0xffff81082fbfcc80
crash> journal_t.j_running_transaction ffff81042d1cb000
j_running_transaction = 0xffff81082fbfcc80
That's a match with the transaction the journal task is trying to commit. So what's holding up the ibbackup task?
crash> task -R last_ran
PID: 23364 TASK: ffff8105c6a75040 CPU: 8 COMMAND: "ibbackup"
last_ran = 320853786401127,
crash> runq | grep "CPU 8"
CPU 8 RUNQUEUE: ffff810001015120
crash> rq.timestamp_last_tick ffff810001015120
timestamp_last_tick = 320853791294947
crash> pd (320853791294947-320853786401127)
$1 = 4893820
crash> pd (320853791294947-320853786401127)/1000000000
$2 = 0
Blocked for just 4 milliseconds so it's running. Just must have a lot of work to do to fsync the file and this has caused a delay.
crash> struct file.f_mapping ffff81082d6472c0
f_mapping = 0xffff8108800f99a8
crash> address_space.host,nrpages ffff8108800f99a8
host = 0xffff8108800f9898
nrpages = 1072292
Potentially 1 million pages to write out.
crash> pd 1072292*4096
$4 = 97140736
92MB worth of pages.
crash> inode.i_size 0xffff8108800f9898
i_size = 174139113472
File size is 162GB so there's a good chance it's sparsely allocated and would require many single page I/Os to flush the file to disk.
The task is current blocked on the lock for a buffer:
PID: 23364 TASK: ffff8105c6a75040 CPU: 8 COMMAND: "ibbackup"
#0 [ffff810750ed1548] schedule at ffffffff80062f8e
#1 [ffff810750ed1620] io_schedule at ffffffff800637ca
#2 [ffff810750ed1640] sync_buffer at ffffffff800154ed
#3 [ffff810750ed1650] __wait_on_bit at ffffffff800639f6
#4 [ffff810750ed1690] out_of_line_wait_on_bit at ffffffff80063a90
#5 [ffff810750ed1700] ext4_mb_init_cache at ffffffff885b89cb [ext4]
...
The buffer is:
crash> buffer_head ffff81014ad4ea90
struct buffer_head {
b_state = 8388652,
b_this_page = 0xffff81014ad4ea90,
b_page = 0xffff810c311c1038,
b_blocknr = 3800452696,
b_size = 4096,
b_data = 0xffff810c7ec01000 "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377",
b_bdev = 0xffff81042e3710c0,
b_end_io = 0xffffffff800321e6 <end_buffer_read_sync>,
b_private = 0x0 <__mod_license2529>,
b_assoc_buffers = {
next = 0xffff81014ad4ead8,
prev = 0xffff81014ad4ead8
},
b_count = {
counter = 3
}
}
Check the state flags:
crash> eval -b 8388652
hexadecimal: 80002c
decimal: 8388652
octal: 40000054
binary: 0000000000000000000000000000000000000000100000000000000000101100
bits set: 23 5 3 2
flags include BH_Lock and BH_Req so the buffer is currently under I/O and this task is waiting for the I/O to complete. Since it's only been waiting 4 milliseconds it doesn't indicate a problem or a deadlock of any kind but rather just congestion on the storage device. I'm sure this problem would have cleared had hung_task_panic not been set.
The large file that was being fsync()'d in this case was:
crash> files -d 0xffff810997319588
DENTRY INODE SUPERBLK TYPE PATH
ffff810997319588 ffff8108800f9898 ffff81042e510800 REG /var/lib/mysql/online_Backup/FullBkp/2013-11-11__1-09-24/uidv1_1/uid_master.ibd
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.