Unmounting of a gfs2 filesystem takes an excessive amount of time

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 6 (with the Resilient Storage Add-on)

Issue

  • The umount command appears to hang or is slow when a gfs2 filesystem is unmounted.

Resolution

Red Hat Enterprise Linux 6
  • This issue was being tracked in Bug 1427309 - Unmounting a gfs2 filesystem appears to take an excessive amount of time (RHEL 6 6.10.0). As of Wed, August 02 2017, the status of Bug 1427309 is CLOSED. The Bugzilla was closed because the patches required to resolve the issue were too invasive based on the current phase of RHEL 6's lifecycle.

NOTE: There is another issue related to umount hanging or taking a long time to complete, which is fixed in a kernel update: Unmounting a GFS2 filesystem appears to hang on RHEL 6


Diagnostic Steps

If possible verify that umount is not hung by letting the process run for 30+ minutes. It is important to find out if umount is hung or just extremely slow.

  • Verify that there is plenty of free memory.
  • Verify that the swap filesystem is not being used.
  • Verify there is no multipathing or storage errors in /var/log/messages.
  • Verify no process is still using any file on the filesystem with lsof output.
  • Capture gfs2_lockcapture before and after the gfs2 filesystem is unmounted.
  • Use ha-resourcemon.sh to capture data about the cluster. This should be running before and during the umount operation.
  • While the umount operation is occurring issue a sysrq t to print the thread information to /var/log/messages.
  • Capture the /var/log/dmesg file before the cluster node is rebooted to see if timeout exceeded on umount causing the remaining glocks to be logged. This requires a particular kernel in order for glocks to be logged as noted in the article.

Analyze the capture data and review the following to see if the umount process is slowly unmounting the gfs2 or hung. If the umount process is hung then you should not see any decreases to dlm locks in locktable, gfs2 objects in /proc/slabinfo, etc.

Review the data collected with ha-resourcemon.sh.
  • Check to see if gfs2 objects in /proc/slabinfo are decreasing.
  • Review the top output collected and check to see if dlm_tool or glock_workqueue have high cpu usage while the umount was occurring.
  • Review the memory data collected to see if swap space was used and there is plenty of free memory.

Review the data collected with gfs2_lockcapture.

  • Review the dlm debug data and verify the dlm locks are decreasing.
  • Review the gfs2 debug data and verify that glocks are decreasing.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.