GFS2 filesystems intermittently hang and glock_workqueue processes use 100% CPU in RHEL 5
Environment
- Red Hat Enterprise Linux 5 (RHEL5) with the Resilient Storage Add On
- Observed in RHEL 5 Update 7 - Update 9
- GFS2 file system(s) mounted on multiple nodes
- Often triggered by a backup utility running against GFS2 file system(s), such as Symantec NetBackup
Issue
- When a large number of cached glocks are built up for GFS2, and memory pressure causes a flush of cache, the CPU utilization of
glock_workqueuebecemes very high, possibly causing the system to become unresponsive. - When page cached is flushed out, the CPU utilization of
glock_workqueuespikes - We are seeing high load on our clusters. This high load is due to CPU utilization. There is an associated large drop in page cache for the GFS2 filesystem during the high load. When the large pagecache drop completes, the load goes back down to normal. The
glock_workqueueprocesses are increased during this high load time period. Very little I/O occurs during the high load event. - A hung process was halted, and during that time the system stopped functioning for all existing users. A glock service spiked to 100% CPU, you were not able to start any new ssh sessions and it kicked existing ssh users off. The symptom went away by the time you we were able to gain access... We saw a huge spike in load and CPU usage, but I/O was at normal levels. Multiple instances of glock_workqueue using 100% cpu.
Resolution
- Update to
kernel-2.6.18-348.19.1.el5or later in RHEL 5 Update 9. - Update to
kernel-2.6.18-371.1.2.el5or later in RHEL 5 Update 10. - Update to
kernel-2.6.18-398.el5or later in RHEL 5 Update 11.
Workarounds:
- Try to identify the source of the glock buildups (such as backup processes, or scripts scanning entire file systems recursively) and either eliminate them, or modify them such that they do not produce so many glocks.
- Alternatively, implement a solution to periodically or at strategic moments flush cache or drop locks. This can be done by:
- Unmounting/remounting a specific file system after it is backed up
- Use
# echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_cachesperiodically in a cron job to keep the number of glocks low - Use
# echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_cachesduring or after backups to flush glocks that were cached in the process - Notice: echoing 1, 2 and then 3 is safer than directly echoing 3 to
/proc/sys/vm/drop_caches, since has been reported that a plain echo 3 can lead to a unresponsive system.
Root Cause
When memory pressure causes the VM to flush pagecache and inactive slabs, glock_workqueue will begin scanning glocks for those that may be locked and demoted. Because there is a glock_workqueue process for each logical CPU, multiple processors may be occupied simultaneously carrying out this work. In certain situations where the number of glocks is very high, this may take an excessive amount of time and CPU cycles to carry out, potentially starving other processes of resources and causing slight to extreme unresponsiveness, and possibly fencing.
Red Hat Engineering released a fix for this issue via Bugzilla #928518. The description of the issue and fix from that bug is as follows:
A bug in the GFS2 code prevented glock work queues from freeing glock-related memory while the
glock memory shrinker repeatedly queued a large number of demote requests, for example when
performing a simultaneous backup of several live GFS2 volumes with a large file count. As a
consequence, the glock work queues became overloaded which resulted in a high CPU usage and the
GFS2 file systems being unresponsive for a significant amount of time. A patch has been applied
to alleviate this problem by calling the yield() function after scheduling a certain amount of
tasks on the glock work queues. The problem can now occur only with extremely high work loads.
Diagnostic Steps
If the following symptoms are present on your system, then this article may apply to your issue:
- Approximately 120 seconds after the issue begins, the following blocked/hung task messages begin repeating in /var/log/messages on all cluster nodes:
kernel: INFO: task pdflush:985 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: pdflush D ffffffff80154dc2 0 985 595 986 984 (L-TLB)
kernel: ffff81087f7cfbc0 0000000000000046 0000000000000001 ffff810845bd55c0
kernel: 0000000000000001 000000000000000a ffff81107f6867a0 ffff8108800a3080
kernel: 00044dfca388ee4c 0000000000004dfc ffff81107f686988 00000001882cd474
kernel: Call Trace:
kernel: [<ffffffff8885e10f>] :gfs2:just_schedule+0x0/0xd
kernel: [<ffffffff8885e118>] :gfs2:just_schedule+0x9/0xd
kernel: [<ffffffff800639fa>] __wait_on_bit+0x40/0x6e
kernel: [<ffffffff8885e10f>] :gfs2:just_schedule+0x0/0xd
kernel: [<ffffffff80063a94>] out_of_line_wait_on_bit+0x6c/0x78
kernel: [<ffffffff800a2e80>] wake_bit_function+0x0/0x23
kernel: [<ffffffff8885e01e>] :gfs2:gfs2_glock_schedule_for_reclaim+0x20/0x99
kernel: [<ffffffff8885e0ce>] :gfs2:gfs2_glock_wait+0x37/0x78
kernel: [<ffffffff8886f09e>] :gfs2:gfs2_write_inode+0x5f/0x151
kernel: [<ffffffff8886f096>] :gfs2:gfs2_write_inode+0x57/0x151
kernel: [<ffffffff8002fa4f>] __writeback_single_inode+0x1dd/0x31c
kernel: [<ffffffff80021143>] sync_sb_inodes+0x1b7/0x271
kernel: [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
kernel: [<ffffffff80050cff>] writeback_inodes+0x82/0xd8
kernel: [<ffffffff800cc3b5>] wb_kupdate+0xef/0x169
kernel: [<ffffffff8005628f>] pdflush+0x0/0x1fb
kernel: [<ffffffff800563e0>] pdflush+0x151/0x1fb
kernel: [<ffffffff800cc2c6>] wb_kupdate+0x0/0x169
kernel: [<ffffffff8003270f>] kthread+0xfe/0x132
kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
kernel: [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
kernel: [<ffffffff80032611>] kthread+0x0/0x132
kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
kernel:
- If you run
topon the system when the issue is occurring, you will see several glock_workqueue processes using 100% of CPU, most of them in the (R)unning state:
top - 18:56:51 up 22 days, 6:49, 32 users, load average: 319.02, 278.33, 192.09
Tasks: 796 total, 15 running, 781 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 78.3%sy, 0.0%ni, 20.9%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65979728k total, 18427132k used, 47552596k free, 1959984k buffers
Swap: 4194296k total, 0k used, 4194296k free, 6681448k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7629 root 20 -5 0 0 0 R 100.0 0.0 68:13.77 glock_workqueue
7631 root 20 -5 0 0 0 R 100.0 0.0 65:12.84 glock_workqueue
7632 root 20 -5 0 0 0 R 100.0 0.0 71:08.88 glock_workqueue
7639 root 20 -5 0 0 0 R 100.0 0.0 65:11.34 glock_workqueue
7640 root 20 -5 0 0 0 R 100.0 0.0 75:56.97 glock_workqueue
7641 root 20 -5 0 0 0 R 100.0 0.0 64:00.45 glock_workqueue
29339 efo 28 3 61228 784 664 R 99.7 0.0 18:30.06 grep
7637 root 20 -5 0 0 0 R 73.3 0.0 52:58.47 glock_workqueue
7633 root 10 -5 0 0 0 R 72.9 0.0 61:30.11 glock_workqueue
7636 root 10 -5 0 0 0 S 51.8 0.0 70:05.82 glock_workqueue
7628 root 10 -5 0 0 0 R 50.5 0.0 63:09.84 glock_workqueue
7643 root 10 -5 0 0 0 S 47.9 0.0 59:51.44 glock_workqueue
7635 root 10 -5 0 0 0 S 46.9 0.0 60:13.05 glock_workqueue
7634 root 10 -5 0 0 0 R 43.2 0.0 60:56.45 glock_workqueue
7638 root 10 -5 0 0 0 R 43.2 0.0 68:55.44 glock_workqueue
7630 root 10 -5 0 0 0 S 41.6 0.0 66:41.29 glock_workqueue
7642 root 10 -5 0 0 0 R 36.6 0.0 61:04.28 glock_workqueue
7761 root 20 -5 0 0 0 R 27.7 0.0 24:06.69 dlm_astd
3256 zangrill 30 15 6052 1924 832 D 11.6 0.0 0:00.54 efs_rnr
6524 root 16 0 106m 15m 3156 D 5.9 0.0 0:00.18 sshd
6519 root 16 0 105m 15m 2924 S 3.3 0.0 0:00.10 sshd
6415 kemnera 15 0 11412 1628 780 R 1.0 0.0 0:00.19 top
5091 root 10 -5 0 0 0 S 0.3 0.0 0:05.66 kjournald
5746 root 15 0 14576 536 396 S 0.3 0.0 111:18.24 mcstransd
6492 root 15 0 95640 3188 2480 S 0.3 0.0 0:00.01 sshd
6769 haldaemo 15 0 52572 6548 2392 S 0.3 0.0 0:02.31 hald
10742 root 15 0 60776 1208 656 S 0.3 0.0 0:01.18 sshd
1 root 15 0 10368 632 540 S 0.0 0.0 3:22.04 init
2 root RT -5 0 0 0 S 0.0 0.0 0:36.29 migration/0</code>
- If you follow the instructions at What data do I gather if processes accessing GFS2 are hung?, you will discover that there are none or very few processes waiting for glocks on the GFS2 filesystem. This indicates there isn't any lock contention:
gfs2_fs1:
G: s:UN n:2/102a5 f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:15005 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs2:
G: s:UN n:2/102a5 f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:14919 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs3:
G: s:UN n:2/102ad f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:15191 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs4:
G: s:UN n:2/102ab f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:7803 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs5:
G: s:UN n:2/102aa f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:14564 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs6:
G: s:UN n:2/102ab f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:14480 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs7:
G: s:UN n:2/16 f:lqO t:EX d:EX/0 l:0 a:0 r:5 m:10
H: s:EX f:W e:0 p:19862 [touch] gfs2_createi+0x63/0xd4f [gfs2]
H: s:EX f:W e:0 p:985 [pdflush] gfs2_write_inode+0x57/0x151 [gfs2]
G: s:UN n:2/102ad f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:14741 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs8:
G: s:UN n:2/102aa f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
H: s:EX f:cW e:0 p:14829 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
G: s:UN n:3/232fc4a f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:10
H: s:EX f:W e:0 p:22653 [paperirs] gfs2_unlink+0xa6/0x1a8 [gfs2]
--------------
More information about reading GFS2 lockdumps can be found in the This content is not included.GFS2 manual.
-
Observe
/proc/slabinfofor the number ofgfs2_glockobjects and observe if the increase in CPU utilization, load, or performance problems coincides with the moment at which a large number of glocks begins getting flushed -
Observe
/proc/meminfoorvmstator other memory monitoring utilities to determine if the onset of problems coincides with cached or used memory approaching 100% of the available system memory, and then being flushed suddenly. -
To diagnose the source of the glock buildups:
- Use glocktop on the RHEL 5.9 kernel or later (contact Red Hat Global Support Services for details)
- Watch
gfs2_glocksin/proc/slabinfoand correlate spikes in numbers with activity on the node at that time.
-
Use a SystemTap script to try and diagnose the source of the cache flushes.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.