Why did the pacemaker `Filesystem` resource not detect that a gfs2 withdrawal occurred and fail the resource?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux Server 6, 7 (with the High Availability Add On)
  • A Global Filesystem 2 (gfs2)
  • pacemaker

Issue

  • Why did the pacemaker Filesystem resource not detect that a gfs2 withdrawal occurred and fail the resource?

Resolution

For a Filesystem resource monitor to fail when a gfs2 withdrawal occurs, an I/O operation has to be performed on the GFS2 filesystem. By default, the Filesystem resource agent only checks whether the filesystem is mounted.

The OCF_CHECK_LEVEL attribute allows the configuration of more in-depth checks at different levels. The default value is 0. Setting OCF_CHECK_LEVEL to 10 or 20 will cause the resource monitor operation to test filesystem I/O.

  • OCF_CHECK_LEVEL=10 implements a read test of the device underlying the filesystem.
  • OCF_CHECK_LEVEL=20 implements both a read and a write test of the filesystem. It writes to a status file and then attempts to read from the status file.

For a more in-depth description of OCF_CHECK_LEVEL options, see the resource agent. Below is a snippet from the Filesystem resource agent metadata.

# pcs resource describe Filesystem
...
The standard monitor operation of depth 0 (also known as probe)
checks if the filesystem is mounted. If you want deeper tests,
set OCF_CHECK_LEVEL to one of the following values:

10: read first 16 blocks of the device (raw read)

This doesn't exercise the filesystem at all, but the device on
which the filesystem lives. This is noop for non-block devices
such as NFS, SMBFS, or bind mounts.

20: test if a status file can be written and read

The status file must be writable by root. This is not always the
case with an NFS mount, as NFS exports usually have the
"root_squash" option set. In such a setup, you must either use
read-only monitoring (depth=10), export with "no_root_squash" on
your NFS server, or grant world write permissions on the
directory where the status file is to be placed.

For a Filesystem resource's monitor to fail after a GFS2 withdrawal, create a second monitor operation with OCF_CHECK_LEVEL=20 so that the monitor performs a read/write test. The cluster node should then be fenced when the monitor operation fails.
NOTE: Do not set the monitor interval to a very low value since it is reading and writing to the filesystem.

# pcs resource op add clusterfs monitor interval=60s on-fail=fence OCF_CHECK_LEVEL=20
# pcs resource show clusterfs
 Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/mapper/vg1-lvol1 directory=/mnt/vg1-lvol1 fstype=gfs2 options=noatime
  Operations: monitor interval=10s on-fail=fence (clusterfs-monitor-interval-10s)
              notify interval=0s timeout=60 (clusterfs-notify-interval-0s)
              start interval=0s timeout=60 (clusterfs-start-interval-0s)
              stop interval=0s timeout=60 (clusterfs-stop-interval-0s)
              monitor interval=60s on-fail=fence OCF_CHECK_LEVEL=20 (clusterfs-monitor-interval-60s)

Alternatively, update the resource itself so that the existing monitor operation (and any new ones) uses OCF_CHECK_LEVEL=20 and performs a read/write test.

# pcs resource update clusterfs OCF_CHECK_LEVEL=20 op monitor interval=60s on-fail=fence --force
# pcs resource show clusterfs
 Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/mapper/vg1-lvol1 directory=/mnt/vg1-lvol1 fstype=gfs2 options=noatime OCF_CHECK_LEVEL=20
  Operations: monitor interval=60s on-fail=fence (clusterfs-monitor-interval-10s)
              notify interval=0s timeout=60 (clusterfs-notify-interval-0s)
              start interval=0s timeout=60 (clusterfs-start-interval-0s)
              stop interval=0s timeout=60 (clusterfs-stop-interval-0s)

This can be tested by manually causing a withdrawal to occur.

# echo "1" > /sys/fs/gfs2/rh7nodesThree\:gfs2-1/withdraw

To cause a cluster node to hard fail when a GFS2 withdrawal occurs, set the option errors=panic will cause the cluster node to kernel panic. For more information, see the following article: Global File System 2: 3.12. The GFS2 Withdraw Function.

Root Cause

By default, the Filesystem resource monitor operation only checks to see if the filesystem is still mounted. A GFS2 withdrawal does not cause the filesystem to be unmounted, as it is still present in the following locations that the monitor operation can check:

  • /proc/mounts
  • /etc/mtab
  • The output of /sbin/mount

If the filesystem is found to be mounted, the monitor considers that a success and the resource monitor will not return as failed.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.