What happens when a gfs2 filesystem is unmounted?
Updated
This article describes what all occurs when a gfs2 filesystem is unmounted on RHEL 6 and RHEL 7. When a gfs2 filesystem is unmounted there is 3 things that need to occur:
- Remove glocks from glock's locktable.
- Remove DLM locks (or lkb) from the DLM's locktable.
- Remove cached objects from slab memory (objects marked
gfs2_glockin/proc/slabinfo).
The function of a gfs2 glock is to cache cluster-wide locks (glocks) whose state is internally managed by DLM (distributed lock manager). The DLM locks are known internally as lkbs. The gfs2 filesystem keeps a "reference count" for each glock.
- When a glock is "referenced" (i.e. has a non-zero reference count) it stays in slab memory, and so does its associated DLM lock.
- When the reference count goes to zero (0) the glock is freed, the associated DLM
lkbfor the glock, and from slab memory (objects markedgfs2_glockin/proc/slabinfo). - The glock is held in its current state until any cluster node in the cluster requests a state change of the glock. For example, if node 2 requests the glock in
EXit must be transitioned toUN(unlocked) by the node who has it locked, and must be locked in the stateEXon the requesting node. That means if you lock a glockEX(exclusive), then unlock it, it stays locked inEXuntil some node requests it be changed. - There is one
glock_workqueueprocess for each CPU on the system. If there are 10 CPUs then there are 10glock_workqueueprocesses.
What happens when a gfs2 filesystem is unmounted?
- The utility
umountcauses the vfs to callput_super(), which, in the case of gfs2 isgfs2_put_super(). - The function
gfs2_put_super()calls the functiongfs2_gl_hash_clear(). - The function
gfs2_gl_hash_clear()calls the functionglock_hash_walk()specifying that functionclear_glock()should be run on every glock on the gfs2 filesystem.- The basic goal of
gfs2_gl_hash_clear()empties out the glock hash table and then signals the processesglock_workqueueto flush all its queued work.
- The basic goal of
- The function
clear_glock()callshandle_callback()withLM_ST_UNLOCKED, which instructs the glock to be transitioned to the "UNLOCKED" state. - Then function
handle_callback()handles glock demote request and queues the glock state machine which is handled by the processglock_workqueue. - As the
glock_workqueueprocesses run for each queued glock, they see that a request has been made to demoted/transitioned a glock to the "UNLOCKED" state. - The state machine (
glock_workqueue) instructs DLM to change the dlm lock state to "NL" (unlocked). A DLM lock can technically beNL(supposed to be NULL, but think of it as Not Locked), which means thelkbstill exists in DLM's slab memory. - The glock reference count is decremented. When the glock reference count is zero, the glock is transitioned:
- DLM is instructed to transition the lock from NL (unlocked) to IN (freed). This instructs DLM the lock is not needed and to free its slab.
- Then the glock is removed from the glock hash table.
- Then the glock is freed from the slab memory (objects marked gfs2_glock in
/proc/slabinfo).
Problems that can occur when unmounting a gfs2 filesystem
- When a gfs2 filesystem takes a long time to unmount it could be a result of having to release so many glocks (and corresponding DLM locks) and remove from slab memory.
- Problems encountered when unmounting a gfs2 filesystem hangs or takes long time are almost always the result of incorrect glock reference count.
- When unmounting a gfs2 filesystem is slow can sometimes be a result of slow IO or not enough memory.
- The
/proc/slabinfomight still have objects (gfs2_*) in the cache after all gfs2 filesystems have been unmounted and is normal. If the modulegfs2is removed withrmmod gfs2then all the gfs2 objects should be removed from/proc/slabinfo. - The
glock_workqueueprocesses could see a higher than normal CPU load to spike on these processes as they perform all the work in their queue. - Unmounting a file system that is mounted on a subdirectory of a gfs2 filesystem will fail in RHEL 6 Resilient Storage clusters
- Unmounting a GFS2 filesystem appears to hang on RHEL 6
- On shutdown a cluster node with mounted GFS2 file systems fails to stop cman and leave cluster gracefully and is subsequently fenced
- After issuing
umount -athe GFS2 fs hangs when mounting it - Unmounting of a gfs2 takes excessive amount of time
- Why does a system hang on shutdown when a gfs2 filesystem has a withdrawal?
- GFS2 unmount blocks in
dlm_release_lockspacewhen issuing a reboot of RHEL 6 cluster node
Reference
- How does gfs2 know when to deallocate a file?
- What is the lifetime of a glock or a DLM resource on a gfs2 filesystem?
- Is there more information on how gfs2 cache works on RHEL 6 and RHEL 7?
- A pacemaker gfs2 filesystem resource failed to stop and umount appear to fail
Product(s)
Category
Components
Article Type