Is my GFS2 slowdown a file system problem or a storage problem?
Introduction
Test the raw write speed of the storage
What can you do about slow media?
The effect of the Linux page cache
How can I tell if it's not my media?
Test the raw write speed of an ext3 file system
Test the raw write speed of a GFS2 file system
What can I do about a GFS2 file system being significantly slower than an ext3 file system?
Check for issues with the storage itself at the time of any observed problems
Are there other tests that could or should be performed
References
Introduction
This article tries to aid in diagnosing if a slowdown on a `GFS2` file system is because of the file-system or the underlying storage. A commonly asked question is:
My GFS2 file system is running slowly; How can I tell
whether the slowdown is a GFS2 problem or a storage problem?
If your GFS2 file system is slow, there may be several root causes, and it's important to know which. The first step is to determine if the problem is related to the storage itself or the file system. Read operations are typically much faster than write operations, so it's best to test the speed of write operations. To do that, you can do some simple tests of the write speed of your media.
Warning:
These diagnostic steps will permanently destroy your file-system, so steps must be taken to back up any data currently saved to the media. After the test is complete, you can recreate the file system and restore the data.
Diagnosing performance problems on GFS2 can be a lengthy process because there may be several root causes. The first step is to determine whether the slowdown is caused by the storage itself or the file system. If the problem is the file system, there are other documents to help you narrow it down further.
GFS2, like all clustered file systems, needs to coordinate all metadata changes with other nodes in the cluster. It does this with inter-node locking known as glocks (gee locks). This means that GFS2 will never be as fast as single-user file systems like xfs, ext3, or ext4.
A related article that provides another way to benchmark a gfs2 filesystem: How to benchmark a gfs2 filesystem?
Test the raw write speed of the storage
The first step is to test the raw storage. For the purposes of illustration, I'm using the `lvm` device `/dev/myvg/mylv` as my storage device to test.
Warning:
This diagnostic step will permanently destroy your file-system, so steps must be taken to back up any data currently saved to the media.
First, make sure the storage is unmounted from all nodes in the cluster. Then from one node, do the following:
- Make sure your system is idle and there are no other processes performing reads or writes.
- Use the
ddcommand with a large block size to write data to the raw device. For this initial test, we're using the option oflag=direct. This takes the linux page cache out of the picture, so this is just the speed of the device:
# dd if=/dev/zero of=/dev/myvg/mylv bs=4M count=1000 oflag=direct
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 30.0617 s, 140 MB/s
In this example, the raw write speed was determined by dd to be approximately 140 megabytes per second. This test was performed on a single generic sas hard drive, and the performance wasn't very good. Now look at the same test done on an enterprise quality SAN with RAID striping:
4194304000 bytes (4.2 GB) copied, 12.163 s, 345 MB/s
This SAN performed nearly two and a half times faster than the sas drive. So a GFS2 file system will never perform well on the slow hard drive because it is limited by the write speed of the device.
What can you do about slow media?
If your media is slow, there can be several root causes as documented in a couple of [tables](/articles/628093#root_causes_of_slowdown). Each of these things should be checked and steps should be taken to resolve them.
The effect of the Linux page cache
In the previous test, we used *oflag=direct* to bypass the Linux page cache. If you do the same test on the device, but use the linux page cache, the performance drops.
First, we must Flush the cache to eliminate any cruft leftover from previous experiments.
# sync; echo 3 > /proc/sys/vm/drop_caches
Now we do the same test as above, but with the Linux page cache:
# dd if=/dev/zero of=/dev/myvg/mylv bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 41.6462 s, 101 MB/s
So by using the page cache, the performance of the raw device without a file system dropped about 28 percent, from 140 MB/s to 101 MB/s. Measurements should be taken more than once to give you a fair comparison because there can be fluctuations. A subsequent run on the same device yielded:
4194304000 bytes (4.2 GB) copied, 43.961 s, 95.4 MB/s
This shows a 32 percent drop. For the purposes of this article, we can average the two and say that the Linux page cache can cost us roughly a 30 percent drop on this hardware. Unfortunately, file system reads and writes are never as straightforward in the real world, so file systems will always need to use the page cache.
How can I tell if it's not my media?
If your media is performing well, the problem might be the file system. The next step is to compare the write speeds of different file systems running on the same media.
Test the raw write speed of an ext3 file system
The next step is to test the speed of an `ext3` file system on the same device. We use an `ext3` file system because its behavior is similar to a `GFS2` file system.
- Create a test directory for mounting:
# mkdir /mnt/gfs2test
- Make a brand new
ext3file system:
# mkfs.ext3 /dev/myvg/mylv
# sync; echo 3 > /proc/sys/vm/drop_caches
- Mount the new
ext3file system and use theddcommand to write a file at the mount point. Notice we're using theconv=fsyncoption to force all the data and metadata to be written to the media. This prevents us from getting false readings, once again, due to the page cache. (We didn't need to use that for the raw device, because there is no metadata, and the data was written with odirect, so the page cache wasn't used.) If the conv=fsync is not used with the dd command then io is buffered (memory cache), then after several consecutive executions of dd command or during times of busy writes the command will appear to pause as the data is written to disk.
The buffered dd commands will dirty pages and won't hit the disk until a condition is met and writeback threads are triggered. Dirty memory is data written to memory but not yet written to disk, this is default behavior for buffered io.
# mount -t ext3 /dev/myvg/mylv /mnt/gfs2test
# dd if=/dev/zero of=/mnt/gfs2test/bigfile bs=4M count=1000 conv=fsync
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 37.9255 s, 111 MB/s
For more information on dirty pages conditions and tuneables then see the following articles:
- Unmount the
ext3file system:
# umount /mnt/gfs2test
So using this hardware, an ext3 file system is roughly 19 percent slower than the raw device (111 / 140) * 100 = 81% of the raw speed.
Why is ext3 faster than the raw device with page cache?
In one of the earlier tests, we did a simple dd to the raw device, not bypassing the page cache, and the performance was somewhere between 95.4 to 101 MB/s. In this ext3 test, it was 111 MB/s. So you may be wondering: How can ext3 be faster than the raw device would allow?
There may be several reasons by the file system may be faster than our original dd test. This test was performed on a normal hard drive, and hard drives have platters and heads. Our write with dd probably just wrote an entire disk platter before moving on to the next, whereas ext3 may have placed the data blocks on multiple platters simultaneously, giving us better performance than a linear write. (It's the same reason that using a RAID device is faster than any of its component hard drives). In other words, ext3 probably lost some performance by using the Linux page cache, but it gained some back by putting the data onto multiple platters of the hard disk.
For this reason, you should compare the write speed of the file systems with the raw speed of the device excluding the Linux page cache (In our example, 140 MB/S).
Test the raw write speed of a GFS2 file system
The next step is to test the speed of an `GFS2` file system on the same device. In this case, we're using [`lock_nolock`](/site/solutions/321503) protocol to avoid all inter-node locking, thus ensuring the test isn't skewed by networking and other issues.
The reason we suggest creating a new GFS2 filesystem is that an existing filesystem could have file fragmentation on it leading to poor performance. In this test we are wanting to get performance data under ideal conditions which would be a new GFS2 filesystem.
- Make a brand new
GFS2file system and set the locking protocol tolock_nolock.
# mkfs.gfs2 -O -p lock_nolock /dev/myvg/mylv
# sync; echo 3 > /proc/sys/vm/drop_caches
- Mount the new
GFS2file system and use theddcommand to write a file at the mount point:
# mount -t gfs2 /dev/myvg/mylv /mnt/gfs2test
# dd if=/dev/zero of=/mnt/gfs2test/bigfile bs=4M count=1000 conv=fsync
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 49.1006 s, 85.4 MB/s
- Unmount the file system
# umount /mnt/gfs2test
Using the same hardware, a GFS2 file system is roughly 39 percent slower than the raw device (85.4 / 140) * 100 = 61% of the raw speed, compared to EXT3, which was 19% slower.
What can I do about a GFS2 file system being significantly slower than an ext3 file system?
In the above example, `ext3` did the `dd` test at about 111 MB/second, and `GFS2` was about 85.4 MB/second. So `GFS2` was roughly 23 percent slower than `ext3`. If `GFS2` is significantly slower than other file systems running on the same media, there may be several root causes: network problems (in a cluster), file fragmentation, inter-node lock (glock) contention, etc. Further analysis will be needed to determine the root causes and steps will need to be taken to correct them.
Check for issues with the storage itself at the time of any observed problems
If there are observable periods of slowness or unresponsiveness of a GFS2 file system, try to determine if the storage may be contributing to this. Use tools like `iostat`, `sar`, `collectl`, or others that can profile how much I/O is being submitted to storage at any given time, and determine if perhaps the storage is unresponsive.
Also review the configuration for any multipath software and determine if it could be masking temporary unresponsiveness from the storage, or if the problems may coincide with a path to the storage failing or timing out. Often when unresponsiveness of storage occurs, the multipath layer must wait for a response from the lower levels of the kernel (SCSI layer, storage controller driver, etc) before I/O will be sent down alternate paths, which can manifest as the file system blocking or performing poorly. These conditions may not always be obvious, as temporary blockages may resolve themselves before an error can be reported, so nothing may be visible in the logs.
Are there other tests that could or should be performed
There are other layers that can make up a `GFS2` filesystem such as `device-mapper-multipath`, `lvm`, or `scsi` devices to name a few. If the above tests do not yield an answers to the performance issue then test the other layers in the same manner. For example test the performance on the logical volume device with no filesystem on the device. In the example we used above that would be the device `/dev/myvg/mylv`.
References
For more information, see the following articles:
- What are some examples of
GFS&GFS2workloads that should be avoided on RHEL5? - How can I use the
GFS2tracepoints and debugfs glocks file in RHEL6? - Does
GFSorGFS2lock all the directories in the file-path for a file when it is created, deleted, or modified on RHEL5 or RHEL6? - My GFS2 filesystem is slow. How can I diagnose and make it faster?
- How to benchmark a gfs2 filesystem?