A GFS2 filesystem hangs because a glock could not be demoted.

Solution Unverified - Updated

Environment

  • Red Hat Enterprise Linux 6, 7 with High Availability Add-on
  • Use of a GFS2 filesystem

Issue

  • A GFS2 filesystem hangs because a glock could not be demoted. The demote time keeps increasing to ever increasing larger number:
 G:  s:EX n:2/16 f:DIqob t:EX d:SH/63555930000 a:0 v:0 r:40 m:200
    H: s:EX f:H e:0 p:21695 [sas] gfs2_unlink+0x7e/0x250 [gfs2]       
    [...] almost 40 waiters behind holder and end                      
    H: s:SH f:AW e:0 p:549 [wscanhw] gfs2_getattr+0xcf/0x1d0 [gfs2]
    I: n:1/22 t:4 f:0x00 d:0x00000001 s:3864

Resolution

Root Cause

The GFS2 file system in some cases became unresponsive due to lock dependency problems between inodes and the cluster lock. This occurred most frequently on nearly full file systems where files and directories were being deleted and recreated at the same block location at the same time.

Diagnostic Steps

To diagnose this issue Red Hat will need the following from each cluster node:

This issue is very hard to diagnose and the only to find out if the GFS2 filesystem is hitting this issue is by capturing lock dumps. Below is some of the items to look for:

  • Is there a process that appears to be hung?
  • Is there a glock with an ever increasing value for the time it takes to demote (d:<time in milliseconds>) a glock?
  • Is there a glock that is in EX state with no holders, but contains an I: inode entry?

Example


This example shows a typical `rm` operation on file or files in the root directory of a GFS2 filesystem that is hung. The root dinode `(2/16)` is locked while the process tries to delete another file with inode `(2/4c10b44)`. 1. Node42 has a glock `(2/16)` that has a large value for the time it is taking to demote a glock from [EX](/articles/35653#Glocks) to [SH](/articles/35653#Glocks) state: `63555930000ms`. 2. The glock `(2/4c10b44)` is currently in SH mode on Node42, and process 21695 is waiting for DLM to convert it to EX. This process cannot proceed until Node43 demotes the glock. 3. The glock `(2/4c10b44)` is in SH mode on node Node43. Node43 has been told to demote the lock from SH so Node42 can lock it in EX. In addition, the time to demote that lock is very high `62680272000ms` on Node43. 4. The glock `(2/4c10b44)` has no holder or waiters on Node43 the process is waiting for that glock to be in the EX state so that it can be the holder, and has one `I:` inode entry.
Node42:
   G:  s:EX n:2/16 f:DIqob t:EX d:SH/63555930000 a:0 v:0 r:40 m:200   [1]
    H: s:EX f:H e:0 p:21695 [sas] gfs2_unlink+0x7e/0x250 [gfs2]        
    [...] almost 40 waiters behind holder and end                      
    H: s:SH f:AW e:0 p:549 [wscanhw] gfs2_getattr+0xcf/0x1d0 [gfs2]
    I: n:1/22 t:4 f:0x00 d:0x00000001 s:3864

   G:  s:SH n:2/4c10b44 f:lIqob t:EX d:EX/0 a:0 v:0 r:4 m:200    [2]
    H: s:EX f:W e:0 p:21695 [sas] gfs2_unlink+0x9f/0x250 [gfs2]         
    I: n:1/79760196 t:4 f:0x00 d:0x00000001 s:3864

Node43:
 G:  s:SH n:2/4c10b44 f:DIqLob t:SH d:UN/62680272000 a:0 v:0 r:3 m:200   [3]
    I: n:1/79760196 t:4 f:0x00 d:0x00000001 s:3864    [4]
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.