What happens when a gfs2 filesystem is unmounted?

Updated 6 May 2020

This article describes what all occurs when a gfs2 filesystem is unmounted on RHEL 6 and RHEL 7. When a gfs2 filesystem is unmounted there is 3 things that need to occur:

Remove glocks from glock's locktable.
Remove DLM locks (or lkb) from the DLM's locktable.
Remove cached objects from slab memory (objects marked gfs2_glock in /proc/slabinfo).

The function of a gfs2 glock is to cache cluster-wide locks (glocks) whose state is internally managed by DLM (distributed lock manager). The DLM locks are known internally as lkbs. The gfs2 filesystem keeps a "reference count" for each glock.

When a glock is "referenced" (i.e. has a non-zero reference count) it stays in slab memory, and so does its associated DLM lock.
When the reference count goes to zero (0) the glock is freed, the associated DLM lkb for the glock, and from slab memory (objects marked gfs2_glock in /proc/slabinfo).
The glock is held in its current state until any cluster node in the cluster requests a state change of the glock. For example, if node 2 requests the glock in EX it must be transitioned to UN (unlocked) by the node who has it locked, and must be locked in the state EX on the requesting node. That means if you lock a glock EX (exclusive), then unlock it, it stays locked in EX until some node requests it be changed.
There is one glock_workqueue process for each CPU on the system. If there are 10 CPUs then there are 10 glock_workqueue processes.

What happens when a gfs2 filesystem is unmounted?

The utility umount causes the vfs to call put_super(), which, in the case of gfs2 is gfs2_put_super().
The function gfs2_put_super() calls the function gfs2_gl_hash_clear().
The function gfs2_gl_hash_clear() calls the function glock_hash_walk() specifying that function clear_glock() should be run on every glock on the gfs2 filesystem.
- The basic goal of gfs2_gl_hash_clear() empties out the glock hash table and then signals the processes glock_workqueue to flush all its queued work.
The function clear_glock() calls handle_callback() with LM_ST_UNLOCKED, which instructs the glock to be transitioned to the "UNLOCKED" state.
Then function handle_callback() handles glock demote request and queues the glock state machine which is handled by the process glock_workqueue.
As the glock_workqueue processes run for each queued glock, they see that a request has been made to demoted/transitioned a glock to the "UNLOCKED" state.
The state machine (glock_workqueue) instructs DLM to change the dlm lock state to "NL" (unlocked). A DLM lock can technically be NL (supposed to be NULL, but think of it as Not Locked), which means the lkb still exists in DLM's slab memory.
The glock reference count is decremented. When the glock reference count is zero, the glock is transitioned:
- DLM is instructed to transition the lock from NL (unlocked) to IN (freed). This instructs DLM the lock is not needed and to free its slab.
- Then the glock is removed from the glock hash table.
- Then the glock is freed from the slab memory (objects marked gfs2_glock in /proc/slabinfo).

Problems that can occur when unmounting a gfs2 filesystem

When a gfs2 filesystem takes a long time to unmount it could be a result of having to release so many glocks (and corresponding DLM locks) and remove from slab memory.
Problems encountered when unmounting a gfs2 filesystem hangs or takes long time are almost always the result of incorrect glock reference count.
When unmounting a gfs2 filesystem is slow can sometimes be a result of slow IO or not enough memory.
The /proc/slabinfo might still have objects (gfs2_*) in the cache after all gfs2 filesystems have been unmounted and is normal. If the module gfs2 is removed with rmmod gfs2 then all the gfs2 objects should be removed from /proc/slabinfo.
The glock_workqueue processes could see a higher than normal CPU load to spike on these processes as they perform all the work in their queue.
Unmounting a file system that is mounted on a subdirectory of a gfs2 filesystem will fail in RHEL 6 Resilient Storage clusters
Unmounting a GFS2 filesystem appears to hang on RHEL 6
On shutdown a cluster node with mounted GFS2 file systems fails to stop cman and leave cluster gracefully and is subsequently fenced
After issuing umount -a the GFS2 fs hangs when mounting it
Unmounting of a gfs2 takes excessive amount of time
Why does a system hang on shutdown when a gfs2 filesystem has a withdrawal?
GFS2 unmount blocks in dlm_release_lockspace when issuing a reboot of RHEL 6 cluster node

Reference

Product(s)

Red Hat Enterprise Linux

Category

Learn more

Components

kernel

Tags

Article Type

General