'Mounting timed out' of XFS local Filesystem in RHEL8 (reflink feature enabled)
Environment
- Red Hat Enterprise Linux 8
- XFS Filesystem with Data Block Sharing (Reflink) enabled (enabled by default in RHEL 8)
Issue
- On RHEL 8 or higher, XFS local Filesystem fails to mount at boot time with 'Mounting timed out'.
- When system is booted, Filesystem can be mounted with no errors, but it takes long time to mount (more than 90 sec)
- The affected Filesystem has been resized (extended).
xfs_inforeports a big number inagcountfor the affected Filesystem.
Resolution
When enabled, Data Block Sharing (Reflink) feature cannot be turned off.
The user would need to reformat (i.e. recreate the Filesystem) to resolve the issue. When we recreate the Filesystem with the final size, we will use an optimal agsize, and the number of AGs will be small.
Root Cause
When we create an XFS Filesystem, the Allocation Group size (agsize in xfs_info output for it) is calculated based on it's initial size to be optimal, keeping low the number of Allocation Groups (AG's). When it's extended, we allocate additional AG's with the original agsize. If the actual size is much bigger than the original one (when it was created), then it will have a huge number of AG's (agcount in xfs_info).
In RHEL 8 we introduced Data Block Sharing (Reflink) feature, that enables users to share data blocks between files. It is enabled by default. When we mount a Filesystem with Data Block Sharing feature enabled, XFS reflink code iterates through all of the AGs to reclaim COW (Copy on write) blocks. When agcount is huge, that could provoke a big delay, leading to the timeout at boot time.
Diagnostic Steps
We get this messages at boot time (logged in /var/log/messages):
Nov 18 07:06:06 testsystem.example.com systemd[1]: Mounting /test...
Nov 18 07:06:11 testsystem.example.com systemd[1]: Received SIGRTMIN+20 from PID 123 (plymouthd).
Nov 18 07:07:36 testsystem.example.com systemd[1]: test.mount: Mounting timed out. Terminating.
Nov 18 07:09:06 testsystem.example.com systemd[1]: test.mount: Mount process timed out. Killing.
Nov 18 07:09:06 testsystem.example.com systemd[1]: test.mount: Killing process 1759 (mount) with signal SIGKILL.
Nov 18 07:09:39 testsystem.example.com systemd[1]: test.mount: Mount process exited, code=killed status=15
Nov 18 07:09:39 testsystem.example.com systemd[1]: Mounted /test. <<<<
For this problem to occur, we should have Data Block Sharing (Reflink) feature enabled, and a big number of AGs:
# xfs_info /test
meta-data=/dev/vdb isize=512 agcount=32768, agsize=163840 blks
^^^^^^^^^^^^^
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
^^^^^^^^
data = bsize=4096 blocks=5368709120, imaxpct=25
= sunit=64 swidth=768 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Note that reflink is set to '1' meaning that Data Block Sharing is enabled, and agcount value is huge!
In the example, the Filesystem was grow from 5Gb to 20Tb.
To verify the issue, we should check the kernel stacks of the mount process while running.
We can collect the info with the attached stacks.sh script.
- terminal window 1 :
# mount /dev/vdb /test
NOTE: In my tests XFS FS is on /dev/vdb and mountpoint is /test. Change device and mountpoint for the accordingly to the ones you're using.
- terminal window 2 :
# ./stacks.sh &> /tmp/stacks.out
The script will run for two mins, collecting kernel stacks for mount process.
When we are hitting this problem, in /tmp/stacks.out we have stacks like this:
mount PID: 1234
PID: 1234, COMM: mount
...
[<0>] xfs_read_agf+0x8e/0x120 [xfs]
[<0>] xfs_alloc_read_agf+0x3e/0x1b0 [xfs]
[<0>] xfs_refcount_recover_cow_leftovers+0x9f/0x360 [xfs]
[<0>] xfs_reflink_recover_cow+0x36/0x50 [xfs] <<<<<<
[<0>] xfs_mountfs+0x5ae/0x910 [xfs]
[<0>] xfs_fs_fill_super+0x41f/0x690 [xfs]
[<0>] mount_bdev+0x177/0x1b0
[<0>] mount_fs+0x3b/0x167
[<0>] vfs_kern_mount.part.35+0x54/0x120
[<0>] do_mount+0x1fc/0xc80
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
And like this:
mount PID: 1234
PID: 1234, COMM: mount
...
[<0>] xfs_read_agi+0x95/0x140 [xfs]
[<0>] xfs_ialloc_read_agi+0x2f/0xd0 [xfs]
[<0>] xfs_finobt_calc_reserves+0x81/0x140 [xfs]
[<0>] xfs_ag_resv_init+0xe0/0x160 [xfs]
[<0>] xfs_fs_reserve_ag_blocks+0x3e/0xb0 [xfs] <<<<<<
[<0>] xfs_mountfs+0x5c1/0x910 [xfs]
[<0>] xfs_fs_fill_super+0x41f/0x690 [xfs]
[<0>] mount_bdev+0x177/0x1b0
[<0>] mount_fs+0x3b/0x167
[<0>] vfs_kern_mount.part.35+0x54/0x120
[<0>] do_mount+0x1fc/0xc80
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca
We can see that xfs_reflink_recover_cow and xfs_fs_reserve_ag_blocks as well iterate through all AGs.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.