Why is my GFS2 filesystem performance slow when doing a `rm -rf *` operation?

Solution Verified - Updated 2 Aug 2024

Environment

Red Hat Enterprise Linux Server 5 (with the High Availability and Resilient Storage Add Ons)
Red Hat Enterprise Linux Server 6 (with the High Availability and Resilient Storage Add Ons)
Red Hat Enterprise Linux Server 7 (with the High Availability and Resilient Storage Add Ons)
A Global Filesystem 2(GFS2)

Issue

Why is my GFS2 filesystem performance slow when doing a rm -rf * operation?

Why do I see memory allocation errors in /var/log/messages like the following below?

  Jul 24 14:42:24 node42 kernel: ------------[ cut here ]------------
  Jul 24 14:42:24 node42 kernel: WARNING: at mm/page_alloc.c:1970 get_page_from_freelist+0x9a8/0x9f0()

Resolution

Performance on a gfs2 filesystem can degrade when a rm -rf * is performed. Please avoid these type of operation whenever possible or minimize the depth of the directory structure that is crawled.

The following errata includes some patches that can improve performance on deallocation or deleting of files and directories on a gfs2 filesystem. In addition, it should help minimize the get_page_from_freelist log events but does not completely eliminate them from occurring.

Red Hat Enterprise Linux 7

The issue (bz1359239) has been resolved with errata RHSA-2018:1062 with the following package(s): kernel-3.10.0-862.el7 or later.

For more information on gfs2 performance:

Root Cause

When doing operations like rm -rf * for deleting files/directories there two main operations that cause performance on gfs2 filesystem to degrade:

Crawling the filesystem of highly-branched directory structure. This involves calling stat() on all the files on the storage in the path specified with rm. This causes glock overhead as caches may need to be invalidated if other nodes are working in that directory structure at the time that the workload was run.
The second problem is unlinks or deletions. In operations such as unlink considerable amounts of metadata lock overhead may be generated. This is because the other nodes accessing the storage must not be allowed access to, or to cache, the data that is being modified or deleted.

The combination of the two workload characteristics can cause considerable performance problems including: processes accessing GFS2 on multiple nodes in D state for extended periods, hung task timeout call traces in the logs, or the operation in question taking an extremely long amount of time to complete.

The get_page_from_freelist log events are trigger when GFS2 allocates a large structure for multiple rgrps when it deletes or truncates files that are either badly fragmented or very large (or both). These are files that span multiple resource groups (slices of the file system).

Diagnostic Steps

Capture the glocktop output when the performance degradation is occurring: How can I view glock contention on a GFS2 filesystem in real-time in a RHEL 5, 6, or 7 Resilient Storage cluster?

Search the glocktop output for the following function in the call trace of each process: recursive_scan. If you see this function happening in rm (but not limited too only that process) then possible you are hitting this issue. For example:

$ grep recursive_scan  /tmp/node42-glocktop.output -B 5 -A 3 | head -n 9
 G:  s:EX n:2/554462ff f:yfIqob t:EX d:EX/0 a:0 v:0 r:3 m:200 (inode)
  H: s:EX f:H e:0 p:26668 [rm] gfs2_evict_inode+0x155/0x470 [gfs2]
  I: n:16054/1430545151 t:8 f:0x00 d:0x00000000 s:31884
  U:  H inode      554462ff  Held:Exclusive   [Dirty, Flush, Queued, Blocking]
  U:  H ---> held by pid 26668 [rm]
  C:              recursive_scan+0x333/0x6f0 [gfs2]      <-------------------!
  C:              trunc_dealloc+0x11a/0x140 [gfs2]
  C:              gfs2_file_dealloc+0x10/0x20 [gfs2]
  C:              gfs2_evict_inode+0x288/0x470 [gfs2]

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

kernel

Category

Learn more

Tags

gfs2

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.