A POSX lock on a gfs2 filesystem is acquired before the cluster node has been fenced and waiting node ignores process signals
Environment
- Red Hat Enterprise Linux Server 6, 7, 8 (with the High Availability Add On and Resilient Storage Add Ons)
- A Global Filesystem 2(
gfs2)
Issue
- A POSX lock on a gfs2 filesystem is acquired before the cluster node has been fenced. Shouldn't the POSIX lock be acquired ONLY after the cluster node is fenced?
- The node that is trying to acquire the POSIX lock on a gfs2 filesystem (and is waiting) seems to ignore the signals SIGTERM and SIGINT that are sent to the process.
Resolution
Red Hat Enterprise Linux 7
- The issue is being tracked with bugzilla 1826858: Bug 1826858 - A POSX lock is acquired before the cluster node has been fenced (RHEL 7 7.9.0). As of Wed, April 22 2020, the status of 1826858 is NEW. There has been no engineer assigned to this bug yet and it is likely in the early stages of investigation. It was later determined that POSIX locks were working as they were suppose to when a cluster node is fenced. This bug will focus on adding support for processes to support being interrupted by signals when accessing a gfs2 filesystem.
#####Red Hat Enterprise Linux 8 - The issue was being tracked but now is closed with bugzilla 1855278: Bug 1855278 - A POSX lock is acquired before the cluster node has been fenced (RHEL 8 8.4.0). It was later determined that POSIX locks were working as they were suppose to when a cluster node is fenced. In addition, RHEL 8 is able to be interrupt processes that are accessing a gfs2 filesystem when a signal is sent.
For more information about POSIX locks on a gfs2 filesystem then see the following article:
- How do POSIX fcntl locks work on GFS2?
- Are ACL's in ext3 and GFS/GFS2 POSIX-compliant?
- What does
gfs_controlddo?
Root Cause
A POSIX lock does not wait for recovery of cluster (successful fence of cluster node or cluster node reboots then rejoin) after a membership change before acquiring the POSIX lock from the cluster node (that held the POSIX lock) that was evicted from the cluster. The POSIX lock is immediately granted to any cluster node waiting to acquire the lock.
POSIX locks should not be used to determine if a cluster node is able to read+write to a gfs2 filesystem.
In addition signals (SIGTERM, SIGINT, etc) cannot interrupt processes that are accessing gfs2 filesystems on RHEL 7.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.