How to benchmark a gfs2 filesystem?

Updated

Introduction


This article explains how to benchmark a gfs2 filesystem on RHEL 6 and RHEL 7 with a utility called `create_unlink_perf` .

Environment

  • Red Hat Enterprise Linux Server 6 (with the High Availability and Resilient Storage Add Ons)
  • Red Hat Enterprise Linux Server 7 (with the High Availability and Resilient Storage Add Ons)
  • A Global Filesystem 2(gfs2)

A benchmark test program has been created to help simulate some of the operations that will occur on a gfs2 filesystem. The benchmark test program is called `create_unlink_perf` and there are pre-built binaries:

NOTE: The benchmark test program create_unlink_perf is not officially supported by Red Hat. It is provided and documented as a convenience for diagnostic purposes, but it carries no guarantees about its behavior and is to be used at the administrator's own risk.

The test is designed to factor out multiple node versus single node. The program flushes caches on all nodes in the cluster between each test, The test will be run from a single cluster node and then ssh into the cluster nodes specified and will run the same script create_unlink_perf to do the work. The existing gfs2 filesystems that are already mounted and being benchmarked do not have to be unmounted while the benchmark test program runs.

The benchmark test program create_unlink_perf will cause the load on the cluster nodes to increase that could cause a negative performance impact until the program is finished running.

What we're testing and why?

  • Serial tests: The serial tests are done to see if each node in the cluster performs the same as the others. If some nodes are significantly slower, it may indicate hardware problems, such as faulty ports on the Ethernet switch, faulty ports on a Fibre channel switch, Faulty ethernet cable, faulty fibre channel cable, faulty host bus adapter (HBA). Even the motherboard or memory on slower nodes are open to suspicion. If all nodes perform slow, it could indicate a central point of contention: slow or faulty ethernet switch, slow storage array, or slow device configuration.
  • Parallel tests: The parallel tests are done to see if tests in parallel are comparable to serial tests. In many cases, the tests may run slower, due to lock contention issues, and the fact that the storage array is doing more work simultaneously. In the tests run faster, it could indicate network routing issues, since more packets are sent through the network, thus allowing more throughput. (Spending too much time waiting for network packets to fill up before sending them, etc.)
  • Unlink tests: These are intended to test whether file create operations are interfering with unlink operations. A lot of these variations go back to testing the contention of GFS2 resource group glocks. The creates will establish resource group locks on different nodes. Later, the unlinks will take advantage of the locks. In a way, this tests GFS2 lock caching and trading the locks through Distributed Lock Manager (DLM).

The benchmark test program `create_unlink_perf` usually takes around 15 to 30 minutes to complete, but can vary based on hardware, software, load on the cluster nodes.

Requirements for

create_unlink_perf

  • The benchmark test program requires that port 62001/tcp is opened.
  • The benchmark test program should be located in the following directory: ~/bin on all cluster nodes that will be running the script. The cluster nodes that will run the script are passed as options to the script.
  • The benchmark test program requires that the gfs2 filesystem has at least 50GB of free space.
  • The benchmark test program requires that the root user can login with This content is not included.ssh keys into each cluster node. to log into each cluster node. The benchmark test program requires the root user because the cache is dropped between each test which can only be done by the root user.

Recommendations when running

create_unlink_perf

  • If possible, create a new gfs2 filesystem on the same storage array as the other gfs2 filesystems. If you run the program on gfs2 filesystem from a different storage array, the results could be skewed (better or worse results). One of the reasons a fresh or new gfs2 filesystem should be used is because an existing gfs2 filesystem will contain some (or a lot) of fragmentation that will impact the results of the script.
  • When the benchmark test program is ran it should be done when the load is the low on the host and all nodes in the cluster.
  • The cache does not need to be flushed before the benchmark test program is ran because the script will do that before starting each test.
Example command of running the script

This command is run from a single cluster node and will perform the tests on the 3 nodes of the cluster that has the gfs2 filesystem mounted `/mnt/vg_shared_2-lvol1`.
# ~/bin/create_unlink_perf -m /mnt/vg_shared_2-lvol1 rhel6-1.examplerh.com rhel6-2.examplerh.com rhel6-3.examplerh.com 

A regular expression can be used for the cluster node names.

# ~/bin/create_unlink_perf -m /mnt/vg_shared_2-lvol1 rhel6-{1..3}.examplerh.com 

Before running the benchmark test program create_unlink_perf make sure that any running process is killed before starting a new test.

# for hostname in rhel7-1.examplerh.com rhel7-2.examplerh.com rhel7-3.examplerh.com; do ssh $hostname "killall create_unlink_perf"; done
# for hostname in rhel7-{1..3}.examplerh.com; do ssh $hostname "killall create_unlink_perf"; done
Example usage and output

Here is an example of running the tool and the output it provides:
# ~/bin/create_unlink_perf -i2 -m /mnt/gfs2 gfs-02-lp0{1..5}
GFS2 mount point: /mnt/gfs2
Testing with 5 nodes, 1 workers per node: 
 gfs-02-lp01 10.16.145.43
 gfs-02-lp02 10.16.145.44
 gfs-02-lp03 10.16.145.45
 gfs-02-lp04 10.16.145.46
 gfs-02-lp05 10.16.145.47

Test It  Node name     Work TestType Microsecs  Files  us/fil files/sec
---- -- -------------- ---- -------- ---------- ------ ------ ---------
 se   1 gfs-p8-02-lp01    1 Create     11565202  50000    231      4329
 se   1 gfs-p8-02-lp02    1 Create     11234168  50000    224      4464
 se   1 gfs-p8-02-lp03    1 Create     12841272  50000    256      3906
 se   1 gfs-p8-02-lp04    1 Create     12726421  50000    254      3937
 se   1 gfs-p8-02-lp05    1 Create     13049616  50000    260      3846
 se   1 gfs-p8-02-lp01    1 Write      27094287   2500  10837        92
 se   1 gfs-p8-02-lp02    1 Write      28172852   2500  11269        88
 se   1 gfs-p8-02-lp03    1 Write      27351127   2500  10940        91
 se   1 gfs-p8-02-lp04    1 Write      27770857   2500  11108        90
 se   1 gfs-p8-02-lp05    1 Write      27263500   2500  10905        91
 se   1 gfs-p8-02-lp01    1 Read       29497625   2500  11799        84
 se   1 gfs-p8-02-lp02    1 Read       28024397   2500  11209        89
 se   1 gfs-p8-02-lp03    1 Read       21175343   2500   8470       118
 se   1 gfs-p8-02-lp04    1 Read       21256996   2500   8502       117
 se   1 gfs-p8-02-lp05    1 Read       21146432   2500   8458       118
 se   1 gfs-p8-02-lp01    1 Unlink     21151534  50000    423      2364
 se   1 gfs-p8-02-lp02    1 Unlink     29758143  50000    595      1680
 se   1 gfs-p8-02-lp03    1 Unlink     23810108  50000    476      2100
 se   1 gfs-p8-02-lp04    1 Unlink     23022839  50000    460      2173
 se   1 gfs-p8-02-lp05    1 Unlink     23175225  50000    463      2159
 se   2 gfs-p8-02-lp01    1 Create     11403309  50000    228      4385
 se   2 gfs-p8-02-lp02    1 Create     11777785  50000    235      4255
 se   2 gfs-p8-02-lp03    1 Create     12876572  50000    257      3891
 se   2 gfs-p8-02-lp04    1 Create     11355958  50000    227      4405
 se   2 gfs-p8-02-lp05    1 Create     12924520  50000    258      3875
 se   2 gfs-p8-02-lp01    1 Write      25268280   2500  10107        98
 se   2 gfs-p8-02-lp02    1 Write      25227021   2500  10090        99
 se   2 gfs-p8-02-lp03    1 Write      25633473   2500  10253        97
 se   2 gfs-p8-02-lp04    1 Write      25169425   2500  10067        99
 se   2 gfs-p8-02-lp05    1 Write      25548858   2500  10219        97
 se   2 gfs-p8-02-lp01    1 Read       36096831   2500  14438        69
 se   2 gfs-p8-02-lp02    1 Read       36865673   2500  14746        67
 se   2 gfs-p8-02-lp03    1 Read       25688770   2500  10275        97
 se   2 gfs-p8-02-lp04    1 Read       26492917   2500  10597        94
 se   2 gfs-p8-02-lp05    1 Read       28204852   2500  11281        88
 se   2 gfs-p8-02-lp01    1 Unlink     23923278  50000    478      2092
 se   2 gfs-p8-02-lp02    1 Unlink     24234492  50000    484      2066
 se   2 gfs-p8-02-lp03    1 Unlink     27992758  50000    559      1788
 se   2 gfs-p8-02-lp04    1 Unlink     24126760  50000    482      2074
 se   2 gfs-p8-02-lp05    1 Unlink     25289334  50000    505      1980
 pa   1 gfs-p8-02-lp05    1 Create      8293381  50000    165      6060
 pa   1 gfs-p8-02-lp01    1 Create      9284993  50000    185      5405
 pa   1 gfs-p8-02-lp04    1 Create      9343951  50000    186      5376
 pa   1 gfs-p8-02-lp02    1 Create      9489407  50000    189      5291
 pa   1 gfs-p8-02-lp03    1 Create     10442297  50000    208      4807
 pa   1 gfs-p8-02-lp03    1 Write      42931324   2500  17172        58
 pa   1 gfs-p8-02-lp01    1 Write      44259051   2500  17703        56
 pa   1 gfs-p8-02-lp02    1 Write      46674285   2500  18669        53
 pa   1 gfs-p8-02-lp04    1 Write      51370666   2500  20548        48
 pa   1 gfs-p8-02-lp05    1 Write      55239160   2500  22095        45
 pa   1 gfs-p8-02-lp01    1 Read       44368335   2500  17747        56
 pa   1 gfs-p8-02-lp03    1 Read       55518490   2500  22207        45
 pa   1 gfs-p8-02-lp04    1 Read       99670401   2500  39868        25
 pa   1 gfs-p8-02-lp05    1 Read      100495228   2500  40198        24
 pa   1 gfs-p8-02-lp02    1 Read      102270637   2500  40908        24
 pa   1 gfs-p8-02-lp03    1 Unlink     25632895  50000    512      1953
 pa   1 gfs-p8-02-lp01    1 Unlink     26784749  50000    535      1869
 pa   1 gfs-p8-02-lp05    1 Unlink     27361223  50000    547      1828
 pa   1 gfs-p8-02-lp04    1 Unlink     28307227  50000    566      1766
 pa   1 gfs-p8-02-lp02    1 Unlink     28455788  50000    569      1757
 pa   2 gfs-p8-02-lp04    1 Create      8373193  50000    167      5988
 pa   2 gfs-p8-02-lp05    1 Create      9188935  50000    183      5464
 pa   2 gfs-p8-02-lp02    1 Create      9291248  50000    185      5405
 pa   2 gfs-p8-02-lp03    1 Create      9519569  50000    190      5263
 pa   2 gfs-p8-02-lp01    1 Create     10273337  50000    205      4878
 pa   2 gfs-p8-02-lp03    1 Write      42797228   2500  17118        58
 pa   2 gfs-p8-02-lp01    1 Write      43268962   2500  17307        57
 pa   2 gfs-p8-02-lp05    1 Write      43749028   2500  17499        57
 pa   2 gfs-p8-02-lp04    1 Write      46942419   2500  18776        53
 pa   2 gfs-p8-02-lp02    1 Write      47275792   2500  18910        52
 pa   2 gfs-p8-02-lp01    1 Read       39041065   2500  15616        64
 pa   2 gfs-p8-02-lp03    1 Read       44389425   2500  17755        56
 pa   2 gfs-p8-02-lp04    1 Read       67968040   2500  27187        36
 pa   2 gfs-p8-02-lp05    1 Read       68962362   2500  27584        36
 pa   2 gfs-p8-02-lp02    1 Read       73508875   2500  29403        34
 pa   2 gfs-p8-02-lp04    1 Unlink     24611932  50000    492      2032
 pa   2 gfs-p8-02-lp03    1 Unlink     26818575  50000    536      1865
 pa   2 gfs-p8-02-lp05    1 Unlink     27159899  50000    543      1841
 pa   2 gfs-p8-02-lp01    1 Unlink     27888112  50000    557      1795
 pa   2 gfs-p8-02-lp02    1 Unlink     30131531  50000    602      1661
 ua   1 gfs-p8-02-lp01    1 Create      7169713  50000    143      6993
 ua   1 gfs-p8-02-lp01    1 Unlink      7967265  50000    159      6289
 ua   1 gfs-p8-02-lp02    1 Create      8587264  50000    171      5847
 ua   1 gfs-p8-02-lp02    1 Unlink      6588287  50000    131      7633
 ua   1 gfs-p8-02-lp03    1 Create      9240521  50000    184      5434
 ua   1 gfs-p8-02-lp03    1 Unlink      7163597  50000    143      6993
 ua   1 gfs-p8-02-lp04    1 Create     10027705  50000    200      5000
 ua   1 gfs-p8-02-lp05    1 Create      9880836  50000    197      5076
 ua   1 gfs-p8-02-lp04    1 Unlink     10486450  50000    209      4784
 ua   1 gfs-p8-02-lp05    1 Unlink      4892826  50000     97     10309
 ua   2 gfs-p8-02-lp01    1 Create      8630539  50000    172      5813
 ua   2 gfs-p8-02-lp01    1 Unlink      8831908  50000    176      5681
 ua   2 gfs-p8-02-lp02    1 Create     10256548  50000    205      4878
 ua   2 gfs-p8-02-lp02    1 Unlink     10888563  50000    217      4608
 ua   2 gfs-p8-02-lp03    1 Create     11937661  50000    238      4201
 ua   2 gfs-p8-02-lp03    1 Unlink      6911705  50000    138      7246
 ua   2 gfs-p8-02-lp04    1 Create     10022115  50000    200      5000
 ua   2 gfs-p8-02-lp04    1 Unlink      7631607  50000    152      6578
 ua   2 gfs-p8-02-lp05    1 Create     10863615  50000    217      4608
 ua   2 gfs-p8-02-lp05    1 Unlink      4361014  50000     87     11494
 ub   1 gfs-p8-02-lp01    1 Create      8520686  50000    170      5882
 ub   1 gfs-p8-02-lp02    1 Create      8639694  50000    172      5813
 ub   1 gfs-p8-02-lp01    1 Unlink      8305460  50000    166      6024
 ub   1 gfs-p8-02-lp03    1 Create     10962108  50000    219      4566
 ub   1 gfs-p8-02-lp02    1 Unlink      4140089  50000     82     12195
 ub   1 gfs-p8-02-lp03    1 Unlink      7177547  50000    143      6993
 ub   1 gfs-p8-02-lp04    1 Create      9129388  50000    182      5494
 ub   1 gfs-p8-02-lp05    1 Create     10012873  50000    200      5000
 ub   1 gfs-p8-02-lp04    1 Unlink     11063866  50000    221      4524
 ub   1 gfs-p8-02-lp05    1 Unlink      5131491  50000    102      9803
 ub   2 gfs-p8-02-lp01    1 Create      8058275  50000    161      6211
 ub   2 gfs-p8-02-lp02    1 Create      8372573  50000    167      5988
 ub   2 gfs-p8-02-lp01    1 Unlink     10874232  50000    217      4608
 ub   2 gfs-p8-02-lp03    1 Create     11012512  50000    220      4545
 ub   2 gfs-p8-02-lp03    1 Unlink      6883298  50000    137      7299
 ub   2 gfs-p8-02-lp04    1 Create      9706401  50000    194      5154
 ub   2 gfs-p8-02-lp02    1 Unlink     11311964  50000    226      4424
 ub   2 gfs-p8-02-lp05    1 Create     10024465  50000    200      5000
 ub   2 gfs-p8-02-lp04    1 Unlink     10997509  50000    219      4566
 ub   2 gfs-p8-02-lp05    1 Unlink      5953213  50000    119      8403
 uc   1 gfs-p8-02-lp01    1 Create      7511806  50000    150      6666
 uc   1 gfs-p8-02-lp01    1 Unlink     10492779  50000    209      4784
 uc   1 gfs-p8-02-lp02    1 Create      8351852  50000    167      5988
 uc   1 gfs-p8-02-lp03    1 Create      9135910  50000    182      5494
 uc   1 gfs-p8-02-lp03    1 Unlink      6935930  50000    138      7246
 uc   1 gfs-p8-02-lp02    1 Unlink      8630732  50000    172      5813
 uc   1 gfs-p8-02-lp04    1 Create      9593579  50000    191      5235
 uc   1 gfs-p8-02-lp04    1 Unlink      9115835  50000    182      5494
 uc   1 gfs-p8-02-lp05    1 Create      9863805  50000    197      5076
 uc   1 gfs-p8-02-lp05    1 Unlink      6088671  50000    121      8264
 uc   2 gfs-p8-02-lp01    1 Create      7743901  50000    154      6493
 uc   2 gfs-p8-02-lp01    1 Unlink      6723795  50000    134      7462
 uc   2 gfs-p8-02-lp02    1 Create      8437675  50000    168      5952
 uc   2 gfs-p8-02-lp03    1 Create      8975957  50000    179      5586
 uc   2 gfs-p8-02-lp03    1 Unlink      6954982  50000    139      7194
 uc   2 gfs-p8-02-lp02    1 Unlink      9213670  50000    184      5434
 uc   2 gfs-p8-02-lp04    1 Create      9770116  50000    195      5128
 uc   2 gfs-p8-02-lp04    1 Unlink      7973829  50000    159      6289
 uc   2 gfs-p8-02-lp05    1 Create      9488781  50000    189      5291
 uc   2 gfs-p8-02-lp05    1 Unlink      5811170  50000    116      8620

Done.
What does each column mean in the output?
  • Column 1 (Test): This designates how the test was run by the nodes.
    • se: This test was done serially, each node did it one at a time.
    • pa: This test was done in parallel on all five nodes simultaneously.
    • ua: Unlink test a where one node creates files while another node unlinks files.
    • ub: Unlink test b where each node creates files, then all nodes unlink files simultaneously.
    • uc: Unlink test c where each node creates a file, unlinks a file, creates a file, unlinks a file, serially.
  • Column 2 (It): The iteration number of the tests. Often the tests are run twice (or more) to reduce bias caused by first-run caching by the storage array.
  • Column 3 (node name): This indicates which node in the cluster produced which results.
  • Column 4 (Work): This indicates which worker thread reported the results. The test may run several worker threads at the same time in order to test concurrency of results.
  • Column 5 (Test Type): This is the type of test that was run
    • Create: File creates were tested.
    • Write: File writes were tested.
    • Read: File reads were tested.
    • Unlink: File unlinks were tested.
  • Column 6 (Microsecs ): The amount of time it took to run the test, in microseconds.
  • **Column 7 (Files):**The number of files created, written, read, or unlinked for the test.
  • Column 8 (us/fil): The average number of microseconds it took to perform the create, write, etc. per file.
  • Column 9 (files/sec): This is the average number files that were created (calculated throughput), written, read back, or unlinked, per second during the test.

Collecting supplemental data


It is safe to collect supplemental data that is collected with the following tools: [`glocktop`](/articles/666533) and [`ha-resourcemon.sh`](/solutions/368393). In some situations more data is needed in order to figure what was occurring when the benchmark test program `create_unlink_perf` was running.

If you need Red Hat to review the results, copy and paste the output (including command) to the ticket.

Known Issues


If the command fails to run make sure there are no running processes for `create_unlink_perf` on any cluster node. In some instances the following messages is printed:
Bind: Address already in use

If there are running processes then kill them before starting a new test.

# for hostname in rhel7-{1..3}.examplerh.com; do ssh $hostname "echo $hostname; ps aux | grep create_unlink_perf | grep -v "grep""; echo; done
# for hostname in rhel7-{1..3}.examplerh.com; do ssh $hostname "killall create_unlink_perf"; done

If the following error message occurs then open the port `62001/tcp` on the cluster node:
Unable to connect to node  rhel7-2.examplerh.com,worker 1
connect: No route to host
Category
Components
Article Type