What is a GFS2 withdrawal in a RHEL Resilient Storage cluster?

Updated

Introduction


The [gfs2 withdraw](https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Global_File_System_2/s1-manage-gfs2withdraw.html) function is a data integrity feature of gfs2 file systems in a cluster. If the gfs2 kernel module detects an inconsistency in a gfs2 file system following an I/O operation, the file system becomes unavailable to that cluster node(which does not affect the access to the gfs2 file system to the other cluster nodes). In order for the gfs2 file system to properly withdraw then it is [**required**](/site/solutions/46637) that the file system use a clustered LVM device managed by `clvmd`on RHEL 6 and RHEL 7. On RHEL 8+ the shared LVM is managed by a `lvmlockd` and `LVM-activate`. For more information then see:

The gfs2 file system withdrawal was implemented as a cleaner option for handling filesystem errors than causing a kernel panic, and it allows the cluster administrator to conduct a postmortem of the issue and reboot the cluster node at a convenient time. Any services on the cluster node that do not use the gfs2 filesystem can continue to run until the next reboot.

A 1 will be written to the /sys/fs/gfs2/<cluster name>\:<fsname>/withdraw file when a withdraw occurs. A 0 means that the filesystem has not withdrawn. A filesystem can be triggered to withdrawal by the following command:

# echo "1" > /sys/fs/gfs2/<cluster name>\:<fsname>/withdraw

In order for a gfs2 filesystem to withdraw successfully, then any node that will mount a gfs2 filesystem is required to have gfs2-utils installed on that node. The reason is that gfs2-utils package provides a udev rule (/usr/lib/udev/rules.d/82-gfs2-withdraw.rules) and a script (/usr/sbin/gfs2_withdraw_helper) that udev runs when a withdrawal occurs. If gfs2-utils is not installed then the withdrawal will not be successful.

Administrators have the option of bypassing the withdraw function by using the mount option errors=panic on the gfs2 filesystem. This option will force the affected cluster node to panic on a GFS(2) filesystem errors rather than withdraw. When a gfs2 filesystem is mounted with errors=panic, any gfs2 file system withdraw causes the node to be rebooted immediately, which leaves the journal in a known recoverable state. The other nodes see the failure and replay the journal before proceeding, as per dlm's rules.

To cause a specific gfs2 filesystem to panic instead of withdraw then add the following to the mount options for the gfs2 filesystem.

errors=panic

To add this option in pacemaker then update the Filesystem resource for the gfs2 filesystem and add the option errors=panic to the attribute options.

# pcs resource update clusterfs options="noatime,errors=panic"
# mount | grep gfs2
/dev/mapper/shared_vg-lvol1 on /mnt/vg1-lvol1 type gfs2 (rw,noatime,seclabel,errors=panic)

How to Recover


To recover from a gfs2 withdraw (in all cases), a `reboot` of the cluster node that experienced the withdrawal is needed in order to be able to [perform IO on the clustered logical volume](/solutions/46637). After the cluster node is rebooted then it is **recommended** that a [filesystem check is performed on the filesystem with `fsck.gfs2`](/solutions/332223) from the latest version contained in the package `gfs2-utils`. *In some instances a gfs2 withdrawal will hang and require a [hard reboot in order to be recover](/solutions/2935711).*

The reason that the withdraw that occurred in most cases can be determined by the withdraw message that was logged in /var/log/messages or by analyzing the metadata of the gfs2 filesystem. In some instances we will request that the metadata is captured for the gfs2 filesystem in order to investigate why the withdrawal occurred. In some instances the reason for the corruption that triggered the withdrawal cannot be found because filesystem errors can lie dormant for some time before that section of the metadata is accessed which triggered the withdrawal.

Different Types of gfs2 File System Withdrawals


The GFS2 withdraw functions may intervene for many different reasons in order to preserve the gfs2 filesystem without causing further corruption. Below is a list of the different types of withdrawals that can be encountered on a gfs2 filesystem.

Filesystem Consistency Withdrawals


A *filesystem consistency error withdrawal* is triggered when two pieces of gfs2 metadata disagree with one another. There are 3 types:

Below contains some possible causes of withdraws and is not an exhaustive list.

Inode Consistency Error Withdrawal

  • Trying to delete a file that still has blocks allocated to it (that were not freed).
  • Trying to delete a file that has an invalid block address.
  • A directory whose inode size disagrees with the hash table size calculated by inode depth.
  • A directory that has directory entries in error (such as improper record length).
  • A directory entry that has inode 0 (and not marking the start of a directory leaf block).
  • A directory not marked as EXHASH (has a directory hash table) that is also not stuffed (containing the directory entries inside).
  • A file system object (file, directory, etc.) whose directory entry access mode disagrees with the inode access mode.
  • A corrupt directory entry.
  • Attempt to delete a directory entry from a leaf block that contains no directory entries.
  • Attempt to delete a file from a directory that has no entries.
  • Attempt to manipulate a directory with less than 2 directory entries (should at least have "." and "..").
  • A symbolic link with no size.
  • An inode whose block address disagrees with the directory entry pointing to it.
  • An inode with more than 10 levels of indirection.
  • A directory with a depth of more than 17.

Resource Group Consistency Error Withdrawal

  • Trying to change a block assignment to the same state it's already in.
  • Trying to free a block that's already free.
  • Trying to set a block to "data" that's already "data".
  • Trying to set a block to "dinode" that's already "dinode".
  • A resource group whose count of free blocks doesn't match the bitmap count of free blocks.
  • A resource group whose count of dinodes doesn't match the bitmap count of dinodes.
  • A resource group whose count of data blocks doesn't match the bitmap count of data blocks.
  • A resource group whose count of data blocks doesn't match the actual data blocks in the region.
  • Invalid resource group information was received from another node (in a dlm lvb).
  • Trying to free a region that isn't part of a resource group (like the superblock, or the resource groups themselves).
  • Trying to free a dinode from a resource group that has no dinodes.
  • Trying to change a block status from free to unlinked.

Journal Consistency Error

  • A journal in which the starting entry could not be found.
  • A journal that does not start with a log header indicating it was unmounted.
  • A journal whose contents cannot be mapped.
  • A journal with multiple wrap points.
  • A journal with duplicate sequence numbers.
  • A journal with a corrupt log header.
  • A journal with a block that cannot be mapped.
  • Writing past the end of a journal without wrapping around.

Invalid Metadata Withdrawals

Magic Number Metadata Consistency


A *magic number consistency error withdrawal* means something within the gfs2 metadata points to more metadata, but that metadata doesn't appear as such because the block doesn't contain the special tag that should exist in all metadata blocks. This type of error means something has corrupted the block itself on disk or in memory, or somehow the reference to it as a metadata block is incorrect. [`fsck.gfs2` should be executed on the file system](/solutions/332223) as soon as possible.

For example, a file might be big enough to have 2 levels of indirection. The dinode points to an indirect block, which points to the data. For some unknown reason, that indirect block isn't tagged as metadata, so it cannot possibly point to a data block.

Metadata Type Consistency


A *metadata type consistency error* is triggered when the expected metadata structure type is different than what is found on disk. The utility[`fsck.gfs2` should be run on the gfs2 file system](/solutions/332223) after all cluster nodes have unmounted the gfs2 filesystem as soon as possible to correct this. An example of how to analyze the metadata for these types of withdrawals is described in this [article](/solutions/716764).

I/O Error Withdrawal


This [io withdrawal](/solutions/550083) occurs when there is an I/O error (storage error) which prevents an I/O request from completing. This can occur at the buffer or somewhere else in the storage layer. This type of error indicates that a problem exists at the storage layer and not at the file system layer, and the file system could not complete its I/O to the storage device(s) successfully in some way.

The fatal error in the storage layer will not trigger the gfs2 file system to withdraw instantly, but rather it will only error out if the storage has not recovered from that condition by the next time that the gfs2 file system is accessed. It is possible that the fatal error may have happened minutes, hours, or days before the withdrawal was triggered if the file system is accessed infrequently.

Assertion Error Withdrawal


When an [assertion](https://en.wikipedia.org/wiki/Assertion_%28software_development%29) is encountered that evaluates to `False` then the gfs2 could possibly be withdrawn. An assertion that evaluates to `False` in gfs2 does not always cause the file system to be withdrawn as shown in this [example](/solutions/55203).

References

SBR
Category
Components
Article Type