My GFS2 filesystem is slow. How can I diagnose and make it faster?

Updated 23 Jul 2020

Introduction
Setting expectations on comparing GFS2 versus non-clustered file systems
How to tell a slowdown from a hang
Data Collection to Diagnose Slowdowns and Hangs
Root Causes of Slowdowns
Slowdown due to Hardware Issues
Slowdown due to System Configuration Issues
Slowdown due to Hardware Configuration Issues
Slowdown due to Software or Usage Problems
Slowdown due to Other Circumstances
Root Causes of Hangs
References

Introduction

This article describes some of they typical reasons that a `GFS2` may not be performing optimal.

Some of the typical questions that are asked are:

Why is my GFS2 filesystem performing so slowly?
Why does my GFS2 filesystem seem to hang, and commands do not complete?
Why are backups so slow on GFS2?
Why does GFS2 performance degrade after I run a backup?
Why is my ext3 or ext4 filesystem performing so much better than my GFS2 filesystem?
Why do benchmarks with dd show my ext3 or ext4 filesystem as being faster than my tests to my GFS2?

Setting expectations on comparing `GFS2` versus non-clustered file systems

GFS2 is a clustered file system, which means many computer systems (“nodes”) share the same media (hard drives, etc.) rather than a single-user file system like ext3, ext4, xfs, which have the media all to themselves.

Since the media is shared between nodes, clustered file systems need to coordinate the changes they make with the other nodes in the cluster. Without that agreement between nodes, the file system metadata would quickly become corrupt and information would be lost. This coordination of the file system metadata is accomplished with inter-node locking mechanism called a glock (“Gee-lock”). This gets very tricky if several nodes have cached information for those objects: some nodes may need to throw out cached information and reread the media. For that reason, clustered file systems will never perform as well as single-user file systems.

How to tell a slowdown from a hang

There's a big difference between a GFS2 hang versus a GFS2 slowdown. With a hang, nothing is moving: data and metadata are not being read from or written to the media. With a slowdown, the data is moving; it's just moving slowly or pausing too much. It's important to know which situation you're in so you can react accordingly.

A GFS2 hang should never occur. If it does, it's probably a bug in the GFS2 kernel code, and the developers need to find it and fix it. You need to contact Red Hat's Global Support Services (GSS). If the problem is already fixed, you may need to get the fix onto your cluster. That may require you to install a newer kernel.

GFS2 slowdowns are not ideal, but they can almost always be solved or worked around.

So how do you tell if you're experiencing a hang or a slowdown? There are several things you can do:

vmstat 1 will tell you the amount of reads and writing happening each second. If you run it for a few minutes and nothing's moving, it's likely a hang.
You can use iostat to be sure nothing's moving over a long period of time.
In RHEL 5, you can use sysrq t to get a list of processes and their call stack information.
In RHEL 6 and 7, you can repeatedly do cat /proc/<pid>/stack to see if a particular job is moving or stuck in one place. In some cases it may appear as if the stack is the same, whereas the process might just be doing the same thing over and over. But this can be a good first step to identifying whether its obviously moving or further diagnostics may be needed
In all releases, attaching to a specific process with strace -p <pid> -fttvo <output file> that seems blocked or slow and reviewing the contents of that file can give insight into whether the process is still working, or what it might be stuck on.

Data Collection to Diagnose Slowdowns and Hangs

It's important to collect the right data while the problem is ongoing, because if the symptoms disappear or nodes in the cluster ends up being restarted in order to recover, it may be difficult or impossible to tell later what exactly was going wrong. These guidelines can help to capture the right information each time so that administrators or Red Hat Global Support Services can make an informed analysis of the problem:

Try and isolate the specific operation that may be slow or blocked. If an application's operation is taking too long to complete, then break down that application into its most basic components and try to identify what is taking longer than expected, if possible.
- Was a command run that seems to be stuck or taking a long time?
- Is the problem happening while creating, opening, or deleting a file?
- Is it trying to lock a file or directory before accessing it? If so, what file or directory?
- Are there multiple operations operating in the same directory at that time?
Capture detailed resource utilization information at a frequent interval at the time of any observed slowdowns or hangs and analyze the output to determine if there are any trends or patterns that consistently seem to lead to the problem. As you'll see in the Root Cause sections below, identifying a number of specific conditions may rely on having data about what the storage was doing, what processes were doing, or in general what the system and kernel were doing. Make sure that the data you collect is as comprehensive as possible, whether that be using scripts that Red Hat recommends, or alternative monitoring tools running on the system. Running utilities like iostat, vmstat, top, ps, netstat -s, ethtool -S, mpstat, and others like it over a period of time covering the problem (and ideally before the problem starts, if possible) is a good start.
Capturing the output of glocktop from all nodes while a problem is ongoing (and ideally, starting before the problem shows up) can give very useful data for analyzing slowdown situations. If an application seems to run more slowly than expected, try running glocktop and evaluate if there are any resources being contended for.
If it appears as if a process is stuck in the same place for long periods of time, consider capturing the state of all nodes' glocks. Note that this data may not be useful in all cases, such as those where an application or command is simply slow, so consider whether this collection will be useful before taking the time to gather it.
Run sosreport on all nodes, ideally while the issue is still ongoing, prior to any reboots or recovery actions. In some situations where GFS2 file systems are unresponsive or the cluster stack may be in a bad state, running sosreport with certain plugins disabled may be necessary.

Root Causes of Slowdowns

There can be many causes for GFS2 slowdowns, so you may need to check several things before you can determine the real cause. There may be more than one cause for the slowness. Here are the primary causes and solutions, condensed into a couple tables:

Slowdown due to Hardware Issues

Root Cause	How to check	Solution	Notes
Slow media	Use `dd` to test the speed of each media device, without a file system. Compare that to a similar speed test of the file system using the same media.	Buy faster media. Use a SAN rather than iSCSI.	The dd test destroys the media. You will need to recreate it with `mkfs.gfs2`. Hard drives slow down when they start to fail. It may spend lots of time retrying read or write operations. Consider using `SMART` to detect drives about to fail.
Network problems	Use a tool like `wireshark` to monitor `DLM` communications between nodes. Look for network problems.	Fix your network, replace defective switches, etc.	You should have a dedicated switch for cluster communications.
Network slowdowns	Install and run the Content from fedorapeople.org is not included.`dlm_klock` tool to test the raw speed of dlm communications without `GFS2` in the picture.	Make sure the cluster a dedicated NIC and switch for inter-node communications.	These are mostly bandwidth problems. You might consider buying a newer network switch.
Out of memory or swapping	Use: `top` and see if `Swap:` is 0k. If not, you may be swapping memory out to disk.	Add memory or kill processes to reduce memory in use.	-

Slowdown due to System Configuration Issues

Root Cause	How to check	Solution	Notes
`selinux` slowing `GFS2` down	Use `getenforce` command. If it says enforcing, it may be doing more work than it needs.	Disable `selinux` for your `GFS2` file system or disable `selinux` altogether.	It's a security risk to disable `selinux`, altogether. If selinux is used for a GFS2 file system, every file system object (file, directory, etc.) will require extra blocks for extended attributes, and extra time to manage them. `GFS2` runs faster if it doesn't need to worry about these attributes.
Mount option `noatime` not used	Mount your `GFS2` file systems and do the command: `mount	grep gfs2`. They should list` noatime` as part of their mount options.	Change your `/etc/fstab` to specify -o noatime for your `GFS2` mount points or mount them manually with `-o noatime`. (References for RHEL 7 and RHEL 8).
Block size too small	`gfs2_edit -p sb	grep <sb_bsize>`. If the size is not 4096, it could cause your file system to run slowly.	Reformat the file system with mkfs.gfs2 using the default block size of 4K
Past `GFS2` slowdown bugs	Make sure you're on the latest kernel for your release and run: uname -a	Upgrade to newer kernel.	We're making it better all the time. RHEL6 should be faster than RHEL 5. RHEL 6.5 will be faster than RHEL 6.4 and RHEL 6.5.z should be faster than RHEL 6.5.0, etc. Get the latest z-stream kernel.

Slowdown due to Hardware Configuration Issues

Root Cause	How to check	Solution	Notes
Using secondary SAN path	Use SAN configuration tools such as `Navispher`e to check which path you're exporting.	Reconfigure SAN to use primary path	Most SANs have a primary and secondary path. The primary path is fast, and the secondary path is slow.
Not using RAID striping	Depends on the situation	Configure the device to use striping	Some file systems like `xfs` automatically take advantage of RAID striping to optimize performance. `GFS2` doesn't. By default, multi-LUN LVM volumes are not striped. Striped volumes will be significantly faster.
Using failover path with `device-mapper-multipath`	Use `multipath -ll` to determine the active path. Compare that to the primary vs. secondary paths exported by the SAN to the LUNs on that system.	The SAN or multipath may need to be reconfigured.	If `device-mapper-multipath` is using the secondary path exported by the SAN rather than the primary path, `GFS2` will run much slower than it would run on the primary path.

Slowdown due to Software or Usage Problems

Root Cause	How to check	Solution	Notes
Directory or file `glock` contention	Use the glocktop and see if there are lots of processes waiting for `type 2`(inode) glocks.	Reorganize your file system or change your application to lessen the contention.	These are mostly application or use-case problems that can be solved.
Backups, `ls -r`, or `du` have cached many millions of files and directories.	Use the command: grep “G:” /sys/kernel/debug/gfs2//glocks \|wc -l to see if there are more than two million glocks cached. This can take a long time if you're running a very old kernel. `GFS2's` glock dump function was sped up greatly in recent kernels.	Review the article: What are some best practices when running a backup of a GFS2 filesystem in a RHEL Resilient Storage cluster?	-
Too many files per directory.	Do `ls -lR` and see how many files are in each directory. If it's in the millions, it could slow you down.	Consider using different software.	`GFS2` doesn't perform well when millions of small files are placed in the same directory.
Multiple nodes writing to the same directory.	Use the `glocktop` tool to check for directory contention.	Consider using different software.	As a rule, `GFS2` will run faster if each node has its own directory to write data. If all the nodes in the cluster are writing to the same directory at the same time, that contention slows it down. Reading from the same files is not a problem.
Multiple nodes writing to the same file.	Use `lsof` on all nodes to see if they're all writing to the same log file, etc.	Consider using different software.	Log files from multiple nodes may be useful, but can also slow things down. Use sparingly.

Slowdown due to Other Circumstances

Root Cause	How to check	Solution	Notes
File fragmentation	Use a tool like `filefrag` to determine if some of your commonly used files are fragmented.	Copy the device to a new device using a newer kernel to “defrag” the file system.	Releases RHEL 6.4 and higher are better at reducing fragmentation.
File system fragmentation	Use a tool like `gfs2_edit` to examine the resource group (`rgrp`) bitmaps to determine if the bitmaps are severely fragmented.	Copy the device to a new device using a newer kernel to “defrag” the file system.	Releases RHEL 6.4 and higher are better at reducing fragmentation.
Resource group (`rgrp`) glock contention	Use the `glocktop` and see if there are lots of processes waiting for type 3(`rgrp`) glocks. This may happen from time to time under normal conditions, but if there are consistently a lot of waiting processes, that's a problem.	Upgrade to RHEL 6.5 or newer.	Kernels for RHEL 6.5 and newer have a built-in `Orlov algorithm` and other improvements that may make it faster.

Root Causes of Hangs

Root Cause	How to check	Solution	Notes
Loss of Quorum	Look at `cman_tool status` and determine if "Total Votes" is greater than or equal to "Quorum", or check `/var/log/messages` to see if a membership transition occurred leading up to the problem	Resolve the issue that caused nodes to leave the cluster, or adjust the configuration or layout of the cluster to better withstand events like these.
Waiting on Fencing	In RHEL 6, use `fence_tool ls` to determine if "wait state" is "fencing", or check `/var/log/messages` to see if fencing was initiated but never completed.	Configure fence devices for all nodes if they don't already have one, consider backup fence devices if the primary device was inaccessible, or adjust the configuration to avoid similar failures
Multipath devices blocking from path failures	Check `/var/log/messages` on all nodes for storage errors or reports of path failures from the multipath software in use, check `iostat` to see if I/O is happening on the active path, and check logs on the storage array for signs of problems	Evaluate if a SCSI-timeout change or software update may lessen the wait time in failures, or try to limit the path failover time, or correct whatever storage-side problem may be causing path failures, delays, or unresponsiveness. Also consider whether queuing when no paths are left (such as with `no_path_retry` with `device-mapper-multipath`) should be disabled to prevent one node's storage failure causing file system blocking throughout the cluster.	Temporary unresponsiveness from storage devices may not always be obvious when multipath software is in use. The SCSI layer and/or storage controller driver can take time to eventually reach a timeout, and if the storage is only unavailable for a short time, errors may never show up in logs. Deeper investigation may be warranted if data shows that I/O is not being transmitted.

Testing the media

This article provide information on how to test your media in order find out where the performance issue is occurring: [Is my GFS2 slowdown a file system problem or a storage problem?](/site/articles/627823)