'Mounting timed out' of XFS local Filesystem in RHEL8 (reflink feature enabled)

Solution Verified - Updated 18 Nov 2024

Environment

Red Hat Enterprise Linux 8
XFS Filesystem with Data Block Sharing (Reflink) enabled (enabled by default in RHEL 8)

Issue

On RHEL 8 or higher, XFS local Filesystem fails to mount at boot time with 'Mounting timed out'.
When system is booted, Filesystem can be mounted with no errors, but it takes long time to mount (more than 90 sec)
The affected Filesystem has been resized (extended).
xfs_info reports a big number in agcount for the affected Filesystem.

Resolution

When enabled, Data Block Sharing (Reflink) feature cannot be turned off.
The user would need to reformat (i.e. recreate the Filesystem) to resolve the issue. When we recreate the Filesystem with the final size, we will use an optimal agsize, and the number of AGs will be small.

Root Cause

When we create an XFS Filesystem, the Allocation Group size (agsize in xfs_info output for it) is calculated based on it's initial size to be optimal, keeping low the number of Allocation Groups (AG's). When it's extended, we allocate additional AG's with the original agsize. If the actual size is much bigger than the original one (when it was created), then it will have a huge number of AG's (agcount in xfs_info).

In RHEL 8 we introduced Data Block Sharing (Reflink) feature, that enables users to share data blocks between files. It is enabled by default. When we mount a Filesystem with Data Block Sharing feature enabled, XFS reflink code iterates through all of the AGs to reclaim COW (Copy on write) blocks. When agcount is huge, that could provoke a big delay, leading to the timeout at boot time.

Diagnostic Steps

We get this messages at boot time (logged in /var/log/messages):

Nov 18 07:06:06 testsystem.example.com systemd[1]: Mounting /test...
Nov 18 07:06:11 testsystem.example.com systemd[1]: Received SIGRTMIN+20 from PID 123 (plymouthd).
Nov 18 07:07:36 testsystem.example.com systemd[1]: test.mount: Mounting timed out. Terminating.
Nov 18 07:09:06 testsystem.example.com systemd[1]: test.mount: Mount process timed out. Killing.
Nov 18 07:09:06 testsystem.example.com systemd[1]: test.mount: Killing process 1759 (mount) with signal SIGKILL.
Nov 18 07:09:39 testsystem.example.com systemd[1]: test.mount: Mount process exited, code=killed status=15
Nov 18 07:09:39 testsystem.example.com systemd[1]: Mounted /test.    <<<<

For this problem to occur, we should have Data Block Sharing (Reflink) feature enabled, and a big number of AGs:

# xfs_info /test
meta-data=/dev/vdb isize=512     agcount=32768, agsize=163840 blks
                                 ^^^^^^^^^^^^^
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
                                 ^^^^^^^^
data     =                       bsize=4096   blocks=5368709120, imaxpct=25
         =                       sunit=64     swidth=768 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Note that reflink is set to '1' meaning that Data Block Sharing is enabled, and agcount value is huge!
In the example, the Filesystem was grow from 5Gb to 20Tb.

To verify the issue, we should check the kernel stacks of the mount process while running.
We can collect the info with the attached stacks.sh script.

terminal window 1 :

# mount /dev/vdb /test

NOTE: In my tests XFS FS is on /dev/vdb and mountpoint is /test. Change device and mountpoint for the accordingly to the ones you're using.

terminal window 2 :

# ./stacks.sh &> /tmp/stacks.out

The script will run for two mins, collecting kernel stacks for mount process.

When we are hitting this problem, in /tmp/stacks.out we have stacks like this:

mount PID: 1234
PID: 1234, COMM: mount
...
[<0>] xfs_read_agf+0x8e/0x120 [xfs]
[<0>] xfs_alloc_read_agf+0x3e/0x1b0 [xfs]
[<0>] xfs_refcount_recover_cow_leftovers+0x9f/0x360 [xfs]
[<0>] xfs_reflink_recover_cow+0x36/0x50 [xfs]  <<<<<<
[<0>] xfs_mountfs+0x5ae/0x910 [xfs]
[<0>] xfs_fs_fill_super+0x41f/0x690 [xfs]
[<0>] mount_bdev+0x177/0x1b0
[<0>] mount_fs+0x3b/0x167
[<0>] vfs_kern_mount.part.35+0x54/0x120
[<0>] do_mount+0x1fc/0xc80
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

And like this:

mount PID: 1234
PID: 1234, COMM: mount
...
[<0>] xfs_read_agi+0x95/0x140 [xfs]
[<0>] xfs_ialloc_read_agi+0x2f/0xd0 [xfs]
[<0>] xfs_finobt_calc_reserves+0x81/0x140 [xfs]
[<0>] xfs_ag_resv_init+0xe0/0x160 [xfs]
[<0>] xfs_fs_reserve_ag_blocks+0x3e/0xb0 [xfs] <<<<<<
[<0>] xfs_mountfs+0x5c1/0x910 [xfs]
[<0>] xfs_fs_fill_super+0x41f/0x690 [xfs]
[<0>] mount_bdev+0x177/0x1b0
[<0>] mount_fs+0x3b/0x167
[<0>] vfs_kern_mount.part.35+0x54/0x120
[<0>] do_mount+0x1fc/0xc80
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

We can see that xfs_reflink_recover_cow and xfs_fs_reserve_ag_blocks as well iterate through all AGs.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.