What data should I gather when access to a GFS2 filesystem appears to be hung or unresponsive on RHEL 5, RHEL 6, or RHEL 7,8,9?
Environment
- Red Hat Enterprise Linux Server 5, 6, 7, 8, 9 (with the High Availability and Resilient Storage Add Ons)
- Global File System 2 (GFS2)
Issue
- Processes accessing
GFS2file-systems arehungin state D (uninterruptible sleep). GFS2is hung- Access to
GFS2hangs - Where do I find
gfs2_lockcapture?
Resolution
The python script used to capture GFS2 and DLM lockdump data in order to troubleshoot GFS2 and DLM performance and hung issues. These scripts are provided as is, so use at your own risk.
There is 2 attachments in this article. One attachment is for running in python 2.7 environment (RHEL 5, 6, 7) and the other is for running in python 3.X environment (RHEL 8, 9+).
- RHEL 5, 6, 7
- Download and extract the file:
gfs2_lockcapture.tar.bz2which contains a file calledgfs2_lockcapture. This script will only run in python 2.7 environments.
- Download and extract the file:
- RHEL 8, 9+
- Download and extract the file:
gfs2_lockcapture-python3.tar.bz2which contains a file calledgfs2_lockcapture-python3. This script will only run in python 3.X environments.
- Download and extract the file:
See the Diagnostic Steps below for information on how to collect the data required to troubleshoot these types of issues.
Related Articles
- How to Improve
GFS/GFS2File System Performance and Prevent Processes from Hanging - For RHEL 4 and
GFS1you should review the article What data do I gather if processes accessing GFS1 are hung on RHEL4? - For RHEL 5 and
GFS1you should review the article What data should I gather when access to aGFS1filesystem appears to be hung or unresponsive on RHEL 5?
Root Cause
When processes hang in uninterpretable sleep with GFS2 processes on the stack, it usually indicates the process in question is IO starved waiting for GFS2 to complete an operation. If this process persists for more than a few seconds without correcting itself, this could result in hung_task_timeout calls in the messages logs and it may be a deadlock.
Diagnostic Steps
To diagnose why a GFS2 file-system appears to be hung, unresponsive, or blocked, Red Hat requires specialized data to diagnose the issue. A script has been created called Content from pagure.io is not included.gfs2_lockcapture which is located at the Content from pagure.io is not included.gfs2-utils git repository which will collect all the information required to analyze why a GFS2 file-system appears to be hung or performance is slow.
If using the gfs2_lockcapture-python3.tar.bz2 version for RHEL 8 or RHEL 9+, then for examples below use the filename gfs2_lockcapture-python3.
- Save the attachment
gfs2_lockcapture.tar.bz2that is attached to this article and then extract it.
# cd ~/
# tar jxvf ~/gfs2_lockcapture.tar.bz2
- While the system appears to be hung, unresponsive, or blocked, run the following command on all the cluster nodes simultaneously. This command assumes that the script is located in the
/tmpdirectory. The command will gather 3 iterations of the lockdump data every 15 seconds. There will be a.tar.bz2file created in the/tmpdirectory that contains the data captured for the cluster node that command was ran on. Before running the command, review the help/usage on what each option does which is describe at bottom of the article:
# python /tmp/gfs2_lockcapture -r 3 -s 60 -o /tmp/ -y
The script will gather process information that can sometimes cause performance issues with systems with high loads. The gathering of process data can be disabled with the -P option.
- After the command has completed on all cluster nodes,
scpthe.tar.bz2lockdump files that were created to one cluster node. Thentararchive them together into 1 archive file. The reason is that you want to keep the data collected for this instance together.
For more information about gfs2_lockcapture options then run the following command or consult the man page. The man page contains information related to the files that are captured and commands that are run on the host:
# python ~/gfs2_lockcapture -h
# man gfs2_lockcapture
In addition to capturing this information, it is often useful to run resource utilization monitoring utilities over a longer period of time. This way, trends in system usage and load can be observed. This usually should include commands like top, vmstat, iostat, mpstat, ps, cat /proc/slabinfo, cat /proc/meminfo, and others like them.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.