My GFS2 filesystem is slow. How can I diagnose and make it faster?
Introduction
Setting expectations on comparing GFS2 versus non-clustered file systems
How to tell a slowdown from a hang
Data Collection to Diagnose Slowdowns and Hangs
Root Causes of Slowdowns
Slowdown due to Hardware Issues
Slowdown due to System Configuration Issues
Slowdown due to Hardware Configuration Issues
Slowdown due to Software or Usage Problems
Slowdown due to Other Circumstances
Root Causes of Hangs
References
Introduction
This article describes some of they typical reasons that a `GFS2` may not be performing optimal.
Some of the typical questions that are asked are:
- Why is my
GFS2filesystem performing so slowly? - Why does my
GFS2filesystem seem to hang, and commands do not complete? - Why are backups so slow on
GFS2? - Why does
GFS2performance degrade after I run a backup? - Why is my
ext3orext4filesystem performing so much better than myGFS2filesystem? - Why do benchmarks with
ddshow myext3orext4filesystem as being faster than my tests to myGFS2?
Setting expectations on comparing GFS2 versus non-clustered file systems
GFS2 is a clustered file system, which means many computer systems (“nodes”) share the same media (hard drives, etc.) rather than a single-user file system like ext3, ext4, xfs, which have the media all to themselves.
Since the media is shared between nodes, clustered file systems need to coordinate the changes they make with the other nodes in the cluster. Without that agreement between nodes, the file system metadata would quickly become corrupt and information would be lost. This coordination of the file system metadata is accomplished with inter-node locking mechanism called a glock (“Gee-lock”). This gets very tricky if several nodes have cached information for those objects: some nodes may need to throw out cached information and reread the media. For that reason, clustered file systems will never perform as well as single-user file systems.
How to tell a slowdown from a hang
There's a big difference between a GFS2 hang versus a GFS2 slowdown. With a hang, nothing is moving: data and metadata are not being read from or written to the media. With a slowdown, the data is moving; it's just moving slowly or pausing too much. It's important to know which situation you're in so you can react accordingly.
A GFS2 hang should never occur. If it does, it's probably a bug in the GFS2 kernel code, and the developers need to find it and fix it. You need to contact Red Hat's Global Support Services (GSS). If the problem is already fixed, you may need to get the fix onto your cluster. That may require you to install a newer kernel.
GFS2 slowdowns are not ideal, but they can almost always be solved or worked around.
So how do you tell if you're experiencing a hang or a slowdown? There are several things you can do:
vmstat1 will tell you the amount of reads and writing happening each second. If you run it for a few minutes and nothing's moving, it's likely a hang.- You can use
iostatto be sure nothing's moving over a long period of time. - In RHEL 5, you can use
sysrq tto get a list of processes and their call stack information. - In RHEL 6 and 7, you can repeatedly do
cat /proc/<pid>/stackto see if a particular job is moving or stuck in one place. In some cases it may appear as if the stack is the same, whereas the process might just be doing the same thing over and over. But this can be a good first step to identifying whether its obviously moving or further diagnostics may be needed - In all releases, attaching to a specific process with
strace -p <pid> -fttvo <output file>that seems blocked or slow and reviewing the contents of that file can give insight into whether the process is still working, or what it might be stuck on.
Data Collection to Diagnose Slowdowns and Hangs
It's important to collect the right data while the problem is ongoing, because if the symptoms disappear or nodes in the cluster ends up being restarted in order to recover, it may be difficult or impossible to tell later what exactly was going wrong. These guidelines can help to capture the right information each time so that administrators or Red Hat Global Support Services can make an informed analysis of the problem:
-
Try and isolate the specific operation that may be slow or blocked. If an application's operation is taking too long to complete, then break down that application into its most basic components and try to identify what is taking longer than expected, if possible.
- Was a command run that seems to be stuck or taking a long time?
- Is the problem happening while creating, opening, or deleting a file?
- Is it trying to lock a file or directory before accessing it? If so, what file or directory?
- Are there multiple operations operating in the same directory at that time?
-
Capture detailed resource utilization information at a frequent interval at the time of any observed slowdowns or hangs and analyze the output to determine if there are any trends or patterns that consistently seem to lead to the problem. As you'll see in the Root Cause sections below, identifying a number of specific conditions may rely on having data about what the storage was doing, what processes were doing, or in general what the system and kernel were doing. Make sure that the data you collect is as comprehensive as possible, whether that be using scripts that Red Hat recommends, or alternative monitoring tools running on the system. Running utilities like
iostat,vmstat,top,ps,netstat -s,ethtool -S,mpstat, and others like it over a period of time covering the problem (and ideally before the problem starts, if possible) is a good start. -
Capturing the output of
glocktopfrom all nodes while a problem is ongoing (and ideally, starting before the problem shows up) can give very useful data for analyzing slowdown situations. If an application seems to run more slowly than expected, try runningglocktopand evaluate if there are any resources being contended for. -
If it appears as if a process is stuck in the same place for long periods of time, consider capturing the state of all nodes' glocks. Note that this data may not be useful in all cases, such as those where an application or command is simply slow, so consider whether this collection will be useful before taking the time to gather it.
-
Run
sosreporton all nodes, ideally while the issue is still ongoing, prior to any reboots or recovery actions. In some situations where GFS2 file systems are unresponsive or the cluster stack may be in a bad state, runningsosreportwith certain plugins disabled may be necessary.
Root Causes of Slowdowns
There can be many causes for GFS2 slowdowns, so you may need to check several things before you can determine the real cause. There may be more than one cause for the slowness. Here are the primary causes and solutions, condensed into a couple tables:
Slowdown due to Hardware Issues
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
| Slow media | Use dd to test the speed of each media device, without a file system. Compare that to a similar speed test of the file system using the same media. | Buy faster media. Use a SAN rather than iSCSI. | The dd test destroys the media. You will need to recreate it with mkfs.gfs2. Hard drives slow down when they start to fail. It may spend lots of time retrying read or write operations. Consider using SMART to detect drives about to fail. |
| Network problems | Use a tool like wireshark to monitor DLM communications between nodes. Look for network problems. | Fix your network, replace defective switches, etc. | You should have a dedicated switch for cluster communications. |
| Network slowdowns | Install and run the Content from fedorapeople.org is not included.dlm_klock tool to test the raw speed of dlm communications without GFS2 in the picture. | Make sure the cluster a dedicated NIC and switch for inter-node communications. | These are mostly bandwidth problems. You might consider buying a newer network switch. |
| Out of memory or swapping | Use: top and see if Swap: is 0k. If not, you may be swapping memory out to disk. | Add memory or kill processes to reduce memory in use. | - |
Slowdown due to System Configuration Issues
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
selinux slowing GFS2 down | Use getenforce command. If it says enforcing, it may be doing more work than it needs. | Disable selinux for your GFS2 file system or disable selinux altogether. | It's a security risk to disable selinux, altogether. If selinux is used for a GFS2 file system, every file system object (file, directory, etc.) will require extra blocks for extended attributes, and extra time to manage them. GFS2 runs faster if it doesn't need to worry about these attributes. |
Mount option noatime not used | Mount your GFS2 file systems and do the command: `mount | grep gfs2. They should list noatime` as part of their mount options. | Change your /etc/fstab to specify -o noatime for your GFS2 mount points or mount them manually with -o noatime. (References for RHEL 7 and RHEL 8). |
| Block size too small | `gfs2_edit -p sb | grep <sb_bsize>`. If the size is not 4096, it could cause your file system to run slowly. | Reformat the file system with mkfs.gfs2 using the default block size of 4K |
Past GFS2 slowdown bugs | Make sure you're on the latest kernel for your release and run: uname -a | Upgrade to newer kernel. | We're making it better all the time. RHEL6 should be faster than RHEL 5. RHEL 6.5 will be faster than RHEL 6.4 and RHEL 6.5.z should be faster than RHEL 6.5.0, etc. Get the latest z-stream kernel. |
Slowdown due to Hardware Configuration Issues
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
| Using secondary SAN path | Use SAN configuration tools such as Navisphere to check which path you're exporting. | Reconfigure SAN to use primary path | Most SANs have a primary and secondary path. The primary path is fast, and the secondary path is slow. |
| Not using RAID striping | Depends on the situation | Configure the device to use striping | Some file systems like xfs automatically take advantage of RAID striping to optimize performance. GFS2 doesn't. By default, multi-LUN LVM volumes are not striped. Striped volumes will be significantly faster. |
Using failover path with device-mapper-multipath | Use multipath -ll to determine the active path. Compare that to the primary vs. secondary paths exported by the SAN to the LUNs on that system. | The SAN or multipath may need to be reconfigured. | If device-mapper-multipath is using the secondary path exported by the SAN rather than the primary path, GFS2 will run much slower than it would run on the primary path. |
Slowdown due to Software or Usage Problems
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
Directory or file glock contention | Use the glocktop and see if there are lots of processes waiting for type 2(inode) glocks. | Reorganize your file system or change your application to lessen the contention. | These are mostly application or use-case problems that can be solved. |
Backups, ls -r, or du have cached many millions of files and directories. | Use the command: grep “G:” /sys/kernel/debug/gfs2//glocks |wc -l to see if there are more than two million glocks cached. This can take a long time if you're running a very old kernel. GFS2's glock dump function was sped up greatly in recent kernels. | Review the article: What are some best practices when running a backup of a GFS2 filesystem in a RHEL Resilient Storage cluster? | - |
| Too many files per directory. | Do ls -lR and see how many files are in each directory. If it's in the millions, it could slow you down. | Consider using different software. | GFS2 doesn't perform well when millions of small files are placed in the same directory. |
| Multiple nodes writing to the same directory. | Use the glocktop tool to check for directory contention. | Consider using different software. | As a rule, GFS2 will run faster if each node has its own directory to write data. If all the nodes in the cluster are writing to the same directory at the same time, that contention slows it down. Reading from the same files is not a problem. |
| Multiple nodes writing to the same file. | Use lsof on all nodes to see if they're all writing to the same log file, etc. | Consider using different software. | Log files from multiple nodes may be useful, but can also slow things down. Use sparingly. |
Slowdown due to Other Circumstances
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
| File fragmentation | Use a tool like filefrag to determine if some of your commonly used files are fragmented. | Copy the device to a new device using a newer kernel to “defrag” the file system. | Releases RHEL 6.4 and higher are better at reducing fragmentation. |
| File system fragmentation | Use a tool like gfs2_edit to examine the resource group (rgrp) bitmaps to determine if the bitmaps are severely fragmented. | Copy the device to a new device using a newer kernel to “defrag” the file system. | Releases RHEL 6.4 and higher are better at reducing fragmentation. |
Resource group (rgrp) glock contention | Use the glocktop and see if there are lots of processes waiting for type 3(rgrp) glocks. This may happen from time to time under normal conditions, but if there are consistently a lot of waiting processes, that's a problem. | Upgrade to RHEL 6.5 or newer. | Kernels for RHEL 6.5 and newer have a built-in Orlov algorithm and other improvements that may make it faster. |
Root Causes of Hangs
| Root Cause | How to check | Solution | Notes |
|---|---|---|---|
| Loss of Quorum | Look at cman_tool status and determine if "Total Votes" is greater than or equal to "Quorum", or check /var/log/messages to see if a membership transition occurred leading up to the problem | Resolve the issue that caused nodes to leave the cluster, or adjust the configuration or layout of the cluster to better withstand events like these. | |
| Waiting on Fencing | In RHEL 6, use fence_tool ls to determine if "wait state" is "fencing", or check /var/log/messages to see if fencing was initiated but never completed. | Configure fence devices for all nodes if they don't already have one, consider backup fence devices if the primary device was inaccessible, or adjust the configuration to avoid similar failures | |
| Multipath devices blocking from path failures | Check /var/log/messages on all nodes for storage errors or reports of path failures from the multipath software in use, check iostat to see if I/O is happening on the active path, and check logs on the storage array for signs of problems | Evaluate if a SCSI-timeout change or software update may lessen the wait time in failures, or try to limit the path failover time, or correct whatever storage-side problem may be causing path failures, delays, or unresponsiveness. Also consider whether queuing when no paths are left (such as with no_path_retry with device-mapper-multipath) should be disabled to prevent one node's storage failure causing file system blocking throughout the cluster. | Temporary unresponsiveness from storage devices may not always be obvious when multipath software is in use. The SCSI layer and/or storage controller driver can take time to eventually reach a timeout, and if the storage is only unavailable for a short time, errors may never show up in logs. Deeper investigation may be warranted if data shows that I/O is not being transmitted. |
Testing the media
This article provide information on how to test your media in order find out where the performance issue is occurring: [Is my GFS2 slowdown a file system problem or a storage problem?](/site/articles/627823)
References
- Red Hat Enterprise Linux Cluster, High Availability, and GFS Deployment Best Practices
- What data should I gather when access to a GFS2 filesystem appears to be hung, unresponsive, or slow in RHEL?