GFS2 filesystems intermittently hang and glock_workqueue processes use 100% CPU in RHEL 5

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 5 (RHEL5) with the Resilient Storage Add On
    • Observed in RHEL 5 Update 7 - Update 9
  • GFS2 file system(s) mounted on multiple nodes
  • Often triggered by a backup utility running against GFS2 file system(s), such as Symantec NetBackup

Issue

  • When a large number of cached glocks are built up for GFS2, and memory pressure causes a flush of cache, the CPU utilization of glock_workqueue becemes very high, possibly causing the system to become unresponsive.
  • When page cached is flushed out, the CPU utilization of glock_workqueue spikes
  • We are seeing high load on our clusters. This high load is due to CPU utilization. There is an associated large drop in page cache for the GFS2 filesystem during the high load. When the large pagecache drop completes, the load goes back down to normal. The glock_workqueue processes are increased during this high load time period. Very little I/O occurs during the high load event.
  • A hung process was halted, and during that time the system stopped functioning for all existing users. A glock service spiked to 100% CPU, you were not able to start any new ssh sessions and it kicked existing ssh users off. The symptom went away by the time you we were able to gain access... We saw a huge spike in load and CPU usage, but I/O was at normal levels.  Multiple instances of glock_workqueue using 100% cpu.

Resolution

Workarounds:

  • Try to identify the source of the glock buildups (such as backup processes, or scripts scanning entire file systems recursively) and either eliminate them, or modify them such that they do not produce so many glocks.
  • Alternatively, implement a solution to periodically or at strategic moments flush cache or drop locks. This can be done by:
  • Unmounting/remounting a specific file system after it is backed up
  • Use # echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_caches periodically in a cron job to keep the number of glocks low
  • Use # echo 1 > /proc/sys/vm/drop_caches; echo 2 > /proc/sys/vm/drop_caches; echo 3 > /proc/sys/vm/drop_caches during or after backups to flush glocks that were cached in the process
  • Notice: echoing 1, 2 and then 3 is safer than directly echoing 3 to /proc/sys/vm/drop_caches, since has been reported that a plain echo 3 can lead to a unresponsive system.

Root Cause

When memory pressure causes the VM to flush pagecache and inactive slabs, glock_workqueue will begin scanning glocks for those that may be locked and demoted. Because there is a glock_workqueue process for each logical CPU, multiple processors may be occupied simultaneously carrying out this work. In certain situations where the number of glocks is very high, this may take an excessive amount of time and CPU cycles to carry out, potentially starving other processes of resources and causing slight to extreme unresponsiveness, and possibly fencing.

Red Hat Engineering released a fix for this issue via Bugzilla #928518. The description of the issue and fix from that bug is as follows:

A bug in the GFS2 code prevented glock work queues from freeing glock-related memory while the 
glock memory shrinker repeatedly queued a large number of demote requests, for example when 
performing a simultaneous backup of several live GFS2 volumes with a large file count. As a 
consequence, the glock work queues became overloaded which resulted in a high CPU usage and the 
GFS2 file systems being unresponsive for a significant amount of time. A patch has been applied 
to alleviate this problem by calling the yield() function after scheduling a certain amount of 
tasks on the glock work queues. The problem can now occur only with extremely high work loads.

Diagnostic Steps

If the following symptoms are present on your system, then this article may apply to your issue:

  • Approximately 120 seconds after the issue begins, the following blocked/hung task messages begin repeating in /var/log/messages on all cluster nodes:
kernel: INFO: task pdflush:985 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: pdflush       D ffffffff80154dc2     0   985    595           986   984 (L-TLB)
kernel:  ffff81087f7cfbc0 0000000000000046 0000000000000001 ffff810845bd55c0
kernel:  0000000000000001 000000000000000a ffff81107f6867a0 ffff8108800a3080
kernel:  00044dfca388ee4c 0000000000004dfc ffff81107f686988 00000001882cd474
kernel: Call Trace:
kernel:  [<ffffffff8885e10f>] :gfs2:just_schedule+0x0/0xd
kernel:  [<ffffffff8885e118>] :gfs2:just_schedule+0x9/0xd
kernel:  [<ffffffff800639fa>] __wait_on_bit+0x40/0x6e
kernel:  [<ffffffff8885e10f>] :gfs2:just_schedule+0x0/0xd
kernel:  [<ffffffff80063a94>] out_of_line_wait_on_bit+0x6c/0x78
kernel:  [<ffffffff800a2e80>] wake_bit_function+0x0/0x23
kernel:  [<ffffffff8885e01e>] :gfs2:gfs2_glock_schedule_for_reclaim+0x20/0x99
kernel:  [<ffffffff8885e0ce>] :gfs2:gfs2_glock_wait+0x37/0x78
kernel:  [<ffffffff8886f09e>] :gfs2:gfs2_write_inode+0x5f/0x151
kernel:  [<ffffffff8886f096>] :gfs2:gfs2_write_inode+0x57/0x151
kernel:  [<ffffffff8002fa4f>] __writeback_single_inode+0x1dd/0x31c
kernel:  [<ffffffff80021143>] sync_sb_inodes+0x1b7/0x271
kernel:  [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
kernel:  [<ffffffff80050cff>] writeback_inodes+0x82/0xd8
kernel:  [<ffffffff800cc3b5>] wb_kupdate+0xef/0x169
kernel:  [<ffffffff8005628f>] pdflush+0x0/0x1fb
kernel:  [<ffffffff800563e0>] pdflush+0x151/0x1fb
kernel:  [<ffffffff800cc2c6>] wb_kupdate+0x0/0x169
kernel:  [<ffffffff8003270f>] kthread+0xfe/0x132
kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
kernel:  [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
kernel:  [<ffffffff80032611>] kthread+0x0/0x132
kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
kernel: 
  • If you run top on the system when the issue is occurring, you will see several glock_workqueue processes using 100% of CPU, most of them in the (R)unning state:
top - 18:56:51 up 22 days,  6:49, 32 users,  load average: 319.02, 278.33, 192.09
Tasks: 796 total,  15 running, 781 sleeping,  0 stopped,  0 zombie
Cpu(s):  0.5%us, 78.3%sy,  0.0%ni, 20.9%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65979728k total, 18427132k used, 47552596k free,  1959984k buffers
Swap:  4194296k total,        0k used,  4194296k free,  6681448k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
7629 root      20  -5    0    0    0 R 100.0  0.0  68:13.77 glock_workqueue
7631 root      20  -5    0    0    0 R 100.0  0.0  65:12.84 glock_workqueue
7632 root      20  -5    0    0    0 R 100.0  0.0  71:08.88 glock_workqueue
7639 root      20  -5    0    0    0 R 100.0  0.0  65:11.34 glock_workqueue
7640 root      20  -5    0    0    0 R 100.0  0.0  75:56.97 glock_workqueue
7641 root      20  -5    0    0    0 R 100.0  0.0  64:00.45 glock_workqueue
29339 efo      28  3 61228  784  664 R 99.7  0.0  18:30.06 grep
7637 root      20  -5    0    0    0 R 73.3  0.0  52:58.47 glock_workqueue
7633 root      10  -5    0    0    0 R 72.9  0.0  61:30.11 glock_workqueue
7636 root      10  -5    0    0    0 S 51.8  0.0  70:05.82 glock_workqueue
7628 root      10  -5    0    0    0 R 50.5  0.0  63:09.84 glock_workqueue
7643 root      10  -5    0    0    0 S 47.9  0.0  59:51.44 glock_workqueue
7635 root      10  -5    0    0    0 S 46.9  0.0  60:13.05 glock_workqueue
7634 root      10  -5    0    0    0 R 43.2  0.0  60:56.45 glock_workqueue
7638 root      10  -5    0    0    0 R 43.2  0.0  68:55.44 glock_workqueue
7630 root      10  -5    0    0    0 S 41.6  0.0  66:41.29 glock_workqueue
7642 root      10  -5    0    0    0 R 36.6  0.0  61:04.28 glock_workqueue
7761 root      20  -5    0    0    0 R 27.7  0.0  24:06.69 dlm_astd
3256 zangrill  30  15  6052 1924  832 D 11.6  0.0  0:00.54 efs_rnr
6524 root      16  0  106m  15m 3156 D  5.9  0.0  0:00.18 sshd
6519 root      16  0  105m  15m 2924 S  3.3  0.0  0:00.10 sshd
6415 kemnera  15  0 11412 1628  780 R  1.0  0.0  0:00.19 top
5091 root      10  -5    0    0    0 S  0.3  0.0  0:05.66 kjournald
5746 root      15  0 14576  536  396 S  0.3  0.0 111:18.24 mcstransd
6492 root      15  0 95640 3188 2480 S  0.3  0.0  0:00.01 sshd
6769 haldaemo  15  0 52572 6548 2392 S  0.3  0.0  0:02.31 hald
10742 root      15  0 60776 1208  656 S  0.3  0.0  0:01.18 sshd
    1 root      15  0 10368  632  540 S  0.0  0.0  3:22.04 init
    2 root      RT  -5    0    0    0 S  0.0  0.0  0:36.29 migration/0</code>
gfs2_fs1:
G:  s:UN n:2/102a5 f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:15005 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs2:
G:  s:UN n:2/102a5 f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:14919 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs3:
G:  s:UN n:2/102ad f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:15191 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs4:
G:  s:UN n:2/102ab f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:7803 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs5:
G:  s:UN n:2/102aa f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:14564 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs6:
G:  s:UN n:2/102ab f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:14480 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs7:
G:  s:UN n:2/16 f:lqO t:EX d:EX/0 l:0 a:0 r:5 m:10
 H: s:EX f:W e:0 p:19862 [touch] gfs2_createi+0x63/0xd4f [gfs2]
 H: s:EX f:W e:0 p:985 [pdflush] gfs2_write_inode+0x57/0x151 [gfs2]
G:  s:UN n:2/102ad f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:14741 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
--------------
gfs2_fs8:
G:  s:UN n:2/102aa f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:200
 H: s:EX f:cW e:0 p:14829 [gfs2_quotad] gfs2_statfs_sync+0x37/0x165 [gfs2]
G:  s:UN n:3/232fc4a f:lqO t:EX d:EX/0 l:0 a:0 r:4 m:10
 H: s:EX f:W e:0 p:22653 [paperirs] gfs2_unlink+0xa6/0x1a8 [gfs2]
--------------

More information about reading GFS2 lockdumps can be found in the This content is not included.GFS2 manual.

  • Observe /proc/slabinfo for the number of gfs2_glock objects and observe if the increase in CPU utilization, load, or performance problems coincides with the moment at which a large number of glocks begins getting flushed

  • Observe /proc/meminfo or vmstat or other memory monitoring utilities to determine if the onset of problems coincides with cached or used memory approaching 100% of the available system memory, and then being flushed suddenly.

  • To diagnose the source of the glock buildups:

    • Use glocktop on the RHEL 5.9 kernel or later (contact Red Hat Global Support Services for details)
    • Watch gfs2_glocks in /proc/slabinfo and correlate spikes in numbers with activity on the node at that time.
  • Use a SystemTap script to try and diagnose the source of the cache flushes.

SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.