Why is my GFS2 filesystem performance slow when doing a `rm -rf *` operation?
Environment
- Red Hat Enterprise Linux Server 5 (with the High Availability and Resilient Storage Add Ons)
- Red Hat Enterprise Linux Server 6 (with the High Availability and Resilient Storage Add Ons)
- Red Hat Enterprise Linux Server 7 (with the High Availability and Resilient Storage Add Ons)
- A Global Filesystem 2(
GFS2)
Issue
-
Why is my GFS2 filesystem performance slow when doing a
rm -rf *operation? -
Why do I see memory allocation errors in
/var/log/messageslike the following below?Jul 24 14:42:24 node42 kernel: ------------[ cut here ]------------ Jul 24 14:42:24 node42 kernel: WARNING: at mm/page_alloc.c:1970 get_page_from_freelist+0x9a8/0x9f0()
Resolution
Performance on a gfs2 filesystem can degrade when a rm -rf * is performed. Please avoid these type of operation whenever possible or minimize the depth of the directory structure that is crawled.
The following errata includes some patches that can improve performance on deallocation or deleting of files and directories on a gfs2 filesystem. In addition, it should help minimize the get_page_from_freelist log events but does not completely eliminate them from occurring.
Red Hat Enterprise Linux 7
- The issue (bz1359239) has been resolved with errata RHSA-2018:1062 with the following package(s):
kernel-3.10.0-862.el7or later.
For more information on gfs2 performance:
- Is my GFS2 slowdown a file system problem or a storage problem?
- My GFS2 filesystem is slow. How can I diagnose and make it faster?
- How does GFS2 know when to deallocate a file?
- What is the lifetime of a glock or a DLM resource on a GFS2 filesystem?
Root Cause
When doing operations like rm -rf * for deleting files/directories there two main operations that cause performance on gfs2 filesystem to degrade:
- Crawling the filesystem of highly-branched directory structure. This involves calling
stat()on all the files on the storage in the path specified withrm. This causes glock overhead as caches may need to be invalidated if other nodes are working in that directory structure at the time that the workload was run. - The second problem is unlinks or deletions. In operations such as unlink considerable amounts of metadata lock overhead may be generated. This is because the other nodes accessing the storage must not be allowed access to, or to cache, the data that is being modified or deleted.
The combination of the two workload characteristics can cause considerable performance problems including: processes accessing GFS2 on multiple nodes in D state for extended periods, hung task timeout call traces in the logs, or the operation in question taking an extremely long amount of time to complete.
The get_page_from_freelist log events are trigger when GFS2 allocates a large structure for multiple rgrps when it deletes or truncates files that are either badly fragmented or very large (or both). These are files that span multiple resource groups (slices of the file system).
Diagnostic Steps
- Capture the
glocktopoutput when the performance degradation is occurring: How can I view glock contention on a GFS2 filesystem in real-time in a RHEL 5, 6, or 7 Resilient Storage cluster?
Search the glocktop output for the following function in the call trace of each process: recursive_scan. If you see this function happening in rm (but not limited too only that process) then possible you are hitting this issue. For example:
$ grep recursive_scan /tmp/node42-glocktop.output -B 5 -A 3 | head -n 9
G: s:EX n:2/554462ff f:yfIqob t:EX d:EX/0 a:0 v:0 r:3 m:200 (inode)
H: s:EX f:H e:0 p:26668 [rm] gfs2_evict_inode+0x155/0x470 [gfs2]
I: n:16054/1430545151 t:8 f:0x00 d:0x00000000 s:31884
U: H inode 554462ff Held:Exclusive [Dirty, Flush, Queued, Blocking]
U: H ---> held by pid 26668 [rm]
C: recursive_scan+0x333/0x6f0 [gfs2] <-------------------!
C: trunc_dealloc+0x11a/0x140 [gfs2]
C: gfs2_file_dealloc+0x10/0x20 [gfs2]
C: gfs2_evict_inode+0x288/0x470 [gfs2]
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.