Collecting supplemental system utilization statistics for fence events or performance problems in RHEL High Availability or Resilient Storage clusters

Solution Verified - Updated 17 Sept 2025

Environment

Red Hat Enterprise Linux 5, 6, 7, 8 or 9
High Availability or Resilient Storage Add On

Issue

My cluster is experiencing repeated fence events and I need a way to collect the system utilization data described in similar way described here during the event: "What can I do to determine what caused the token to be lost in my RHEL 5 cluster?" or "What can I do to determine what caused "A processor failed, forming new configuration" and/or a node to be fenced in a High Availability cluster with Pacemaker?".
My Technical Support Engineer has requested I setup the watcher-cron.bsh script on my cluster.
How to setup ha-resourcemon.sh (formerly called fencemon-cron.bsh) script for capturing utilization data on cluster nodes?
What information is required for troubleshooting performance issues in cluster nodes or with clustered filesystem?
How can I collect system utilization statistics on the nodes during fence events or slowness in performance in RHEL cluster?
My GFS2 filesystem is experiencing slowness or hangs?

Resolution

Note   The script attached to ↴ this article for download was formerly named fencemon-cron.bsh, but has since be renamed to ha-resourcemon.sh.

Caution!   The filename upon download is ha-resourcemon_1.sh. This must be modified to have executable permissions added to the file.
chmod +x ha-resourcemon_1.sh

Note   This script will produce large amounts of data in the directory specified, which may require special attention to prevent it from filling up the file system. Ensure there is adequate free space available in the specified directory. This script, at the timing variables listed below, often produces 30 - 50 Mb per hour of collection on typical systems, and may produce much larger data sets on systems that run many processes, handle large workloads, have many network connections, or otherwise may have lengthier outputs for any of the monitored commands. Please contact Red Hat Global Support Services for assistance with accounting for the space requirements if this is a concern.

Install sysstat, ethtool, net-tools and procps on the host that will have ha-resourcemon.sh installed on it.

yum install -y sysstat ethtool net-tools procps

Note: On Red Hat Enterprise Linux (RHEL) 7 and 8 the procps-ng package will be installed.

Find out which interface the cluster is using for cluster communication because we edit the script below to save data for that network interface. This is the network over which each node resolves each other node's nodename not hostname. Use one of the following commands that will output the IP address used for cluster communication. Then use that IP address to find that interface in the output ip addr (execute the command applicable to your cluster environment). These commands below can show either IP addresses for host running the command or an IP address for each cluster node.

For RHEL 6/7/8/9 pacemaker cluster utilizing corosync, use the command below :

# corosync-quorumtool  -il

For RHEL 6 cluster utilizing cman, use the command below:

# cman_tool status | grep "Node addresses"
# cman_tool nodes -a -F "id,name,addr"

Save the attached file below to a file named: ha-resourcemon.sh. After the file is saved, then edit the ifaces variable to refer to the physical interface(s) associated with the heartbeat network that was found in step 2.
Add the executable flag to the script:

chmod +x /path/to/ha-resourcemon.sh

Create a new crontab entry for your root user to call the ha-resourcemon.sh script

crontab -u root -e

1 * * * * /bin/bash  /path/to/ha-resourcemon.sh 20 181 /<logdir> 2

Replace /path/to with the script's path in your environment
Your support technician may ask you to adjust the timing variables within this line
Replace /<logdir> with the directory where you would like to store our log files.
If crontab is unavailable, it is also possible to use a systemd timer to collect this information
Using a systemd timer to collect supplemental system utilization statistics in RHEL High Availability or Resilient Storage Clusters

Allow the script to run until a fence event is experienced or after GFS2 performance has occurred(or has gone on long enough for data to be captured). You can then tar up the data by running the following command

tar -cvjf /tmp/$(hostname)-ha-resourcemon.sh.tar.bz2 /

Post the following below from all nodes to the support case.

The ha-resourcemon.sh(formerly called fencemon-cron.bsh) files archived and compressed with the tar utility from all cluster nodes.
An sos report from all of the cluster nodes.
A description of what was happening on the cluster node(s) when the issue occurred or what job was running when the issue occurred.
The name of the host or hosts that experienced an issue.
How long did the issue occur if this was a performance issue?
The date and time when the issue occurred.

Root Cause

The script ha-resourcemon.sh(formerly called fencemon-cron.bsh) captures additional data for diagnosing reocurring fence events, membership issues such as tokens lost, or GFS2 performance issues. Here is some articles that go into details about troubleshooting those issues:

The script uses standard linux utilities which are nominally already installed on most systems which will prevent the host from having additional packages installed.

Diagnostic Steps

The following information will be captured by the script:

vmstat
mpstat
iostat -tkx
top -b
ps aux
netstat -s
ethtool -S
(optional) pidstack output
- Only available if the --pidstack option is included as the final option:
  $ /path/to/ha-resourcemon.sh 20 181 /<logdir> 2 --pidstack

SBR

Clusterha

Product(s)

Red Hat Enterprise Linux

Components

cluster

Category

Troubleshoot

Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Note	The script attached to ↴ this article for download was formerly named `fencemon-cron.bsh`, but has since be renamed to `ha-resourcemon.sh`.
Caution!	The filename upon download is `ha-resourcemon_1.sh`. This must be modified to have executable permissions added to the file. `chmod +x ha-resourcemon_1.sh`
Note	This script will produce large amounts of data in the directory specified, which may require special attention to prevent it from filling up the file system. Ensure there is adequate free space available in the specified directory. This script, at the timing variables listed below, often produces 30 - 50 Mb per hour of collection on typical systems, and may produce much larger data sets on systems that run many processes, handle large workloads, have many network connections, or otherwise may have lengthier outputs for any of the monitored commands. Please contact Red Hat Global Support Services for assistance with accounting for the space requirements if this is a concern.