Growing a gfs2 filesystem with `gfs2_grow` caused a kernel panic to occur

Solution Verified - Updated

Environment

Issue

  • Growing a gfs2 filesystem with gfs2_grow caused a kernel panic to occur.
GFS2: fsid=: Trying to join cluster "lock_dlm", "STSRHTS22576:grow2"
GFS2: fsid=STSRHTS22576:grow2.0: Joined cluster. Now mounting FS...
GFS2: fsid=STSRHTS22576:grow2.0: jid=0, already locked for use
GFS2: fsid=STSRHTS22576:grow2.0: jid=0: Looking at journal...
GFS2: fsid=STSRHTS22576:grow2.0: jid=0: Done
GFS2: fsid=STSRHTS22576:grow2.0: jid=1: Trying to acquire journal lock...
GFS2: fsid=STSRHTS22576:grow2.0: jid=1: Looking at journal...
GFS2: fsid=STSRHTS22576:grow2.0: jid=1: Done
GFS2: fsid=STSRHTS22576:grow2.0: jid=2: Trying to acquire journal lock...
GFS2: fsid=STSRHTS22576:grow2.0: jid=2: Looking at journal...
GFS2: fsid=STSRHTS22576:grow2.0: jid=2: Done
GFS2: fsid=STSRHTS22576:grow1.0: File system extended by 65088 blocks.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
IP: [<ffffffffa0549556>] gfs2_alloc_blocks+0x4c6/0x860 [gfs2]
Kernel PGD 800000002df6e067 PUD 27423067 PMD 0 
User   PGD 2df6e067 PUD 27423067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/virtual/block/dm-2/range
CPU 0 
Modules linked in: gfs2 dlm configfs sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i iw_cxgb4 cxgb4 cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables sctp libcrc32c autofs4 ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 dm_multipath microcode i6300esb virtio_balloon virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]

Pid: 12773, comm: gfs2_grow Not tainted 2.6.32-754.el6.x86_64 #1 Red Hat KVM
RIP: 0010:[<ffffffffa0549556>]  [<ffffffffa0549556>] gfs2_alloc_blocks+0x4c6/0x860 [gfs2]
RSP: 0000:ffff88002eb97718  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88002eb97934 RCX: 0000000000000000
RDX: ffff8800233f7378 RSI: ffff88002eb97928 RDI: ffff8800233f7040
RBP: ffff88002eb977c8 R08: 0000000000000000 R09: ff9ab59472975802
R10: 00000000ffffff80 R11: ffffffffa0521ad0 R12: ffff8800233f7040
R13: ffff8800233f7040 R14: ffff880032581c28 R15: 0000000000000001
FS:  00007f92c670e700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 00000000405e0000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gfs2_grow (pid: 12773, threadinfo ffff88002eb94000, task ffff880012fecab0)
Stack:
 ffff88002eb97738 ffffffff810a8e5f ffff88002eb97748 ffff88002eb97928
<d> ffff8800233f7338 0000000000000000 ffff880043a49000 ffffff00810a8e5f
<d> ffff8800474a7d60 ffff880038728dc0 ffff88002eb97778 ffffffff811d50e7
Call Trace:
 [<ffffffff810a8e5f>] ? wake_up_bit+0x2f/0x40
 [<ffffffff811d50e7>] ? unlock_buffer+0x17/0x20
 [<ffffffffa0521fb4>] gfs2_block_map+0x4e4/0xf00 [gfs2]
 [<ffffffff81076c42>] ? enqueue_entity+0x112/0x450
 [<ffffffffa052d850>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
 [<ffffffff810a8cf7>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff811d6d4e>] __block_prepare_write+0x1de/0x4e0
 [<ffffffffa0521ad0>] ? gfs2_block_map+0x0/0xf00 [gfs2]
 [<ffffffff811d7077>] block_prepare_write+0x27/0x40
 [<ffffffffa053a0ca>] gfs2_write_begin+0x3ea/0x480 [gfs2]
 [<ffffffffa052e2d3>] ? gfs2_holder_uninit+0x23/0x40 [gfs2]
 [<ffffffff81131ecd>] generic_file_buffered_write+0x12d/0x2f0
 [<ffffffff81133950>] __generic_file_aio_write+0x260/0x490
 [<ffffffff81133c08>] generic_file_aio_write+0x88/0x100
 [<ffffffffa053c7a3>] gfs2_file_aio_write+0xf3/0x130 [gfs2]
 [<ffffffffa052e525>] ? gfs2_glock_wait+0x25/0x90 [gfs2]
 [<ffffffff810a8cf7>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff8119d270>] do_sync_write+0x100/0x140
 [<ffffffffa0530e9d>] ? do_promote+0x21d/0x350 [gfs2]
 [<ffffffffa052e525>] ? gfs2_glock_wait+0x25/0x90 [gfs2]
 [<ffffffff810a8e70>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa052e2d3>] ? gfs2_holder_uninit+0x23/0x40 [gfs2]
 [<ffffffff8124c6ab>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123f27c>] ? security_file_permission+0x1c/0x20
 [<ffffffff8119d56a>] vfs_write+0xba/0x1a0
 [<ffffffff8119ea66>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119e0a1>] sys_write+0x51/0xb0
 [<ffffffff810f0cae>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8155f351>] system_call_fastpath+0x2f/0x34
Code: 41 5f c9 c3 90 48 8b 45 b0 48 8b 50 50 48 89 11 48 83 c2 01 48 89 50 50 e9 4c ff ff ff 0f 1f 84 00 00 00 00 00 80 7d 8c 00 75 7a <8b> 48 34 48 8b 50 28 4d 8b b5 80 03 00 00 48 01 d1 49 39 ce 0f 
RIP  [<ffffffffa0549556>] gfs2_alloc_blocks+0x4c6/0x860 [gfs2]
 RSP <ffff88002eb97718>
CR2: 0000000000000034
---[ end trace 731b9bfba57e4d1e ]---
Kernel panic - not syncing: Fatal exception
Pid: 12773, comm: gfs2_grow Tainted: G      D    -- ------------    2.6.32-754.el6.x86_64 #1
Call Trace:
 [<ffffffff8155341b>] ? panic+0xa7/0x18b
 [<ffffffff81559304>] ? oops_end+0xe4/0x100
 [<ffffffff81052b4b>] ? no_context+0xfb/0x260
 [<ffffffff81052dd5>] ? __bad_area_nosemaphore+0x125/0x1e0
 [<ffffffff81052efe>] ? bad_area+0x4e/0x60
 [<ffffffff81053723>] ? __do_page_fault+0x473/0x500
 [<ffffffff810471e8>] ? pvclock_clocksource_read+0x58/0xd0
 [<ffffffff81066b5e>] ? account_entity_enqueue+0x7e/0x90
 [<ffffffff81076c42>] ? enqueue_entity+0x112/0x450
 [<ffffffff8155b29e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff81558285>] ? page_fault+0x25/0x30
 [<ffffffffa0521ad0>] ? gfs2_block_map+0x0/0xf00 [gfs2]
 [<ffffffffa0549556>] ? gfs2_alloc_blocks+0x4c6/0x860 [gfs2]
 [<ffffffff810a8e5f>] ? wake_up_bit+0x2f/0x40
 [<ffffffff811d50e7>] ? unlock_buffer+0x17/0x20
 [<ffffffffa0521fb4>] ? gfs2_block_map+0x4e4/0xf00 [gfs2]
 [<ffffffff81076c42>] ? enqueue_entity+0x112/0x450
 [<ffffffffa052d850>] ? gfs2_glock_holder_wait+0x0/0x20 [gfs2]
 [<ffffffff810a8cf7>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff811d6d4e>] ? __block_prepare_write+0x1de/0x4e0
 [<ffffffffa0521ad0>] ? gfs2_block_map+0x0/0xf00 [gfs2]
 [<ffffffff811d7077>] ? block_prepare_write+0x27/0x40
 [<ffffffffa053a0ca>] ? gfs2_write_begin+0x3ea/0x480 [gfs2]
 [<ffffffffa052e2d3>] ? gfs2_holder_uninit+0x23/0x40 [gfs2]
 [<ffffffff81131ecd>] ? generic_file_buffered_write+0x12d/0x2f0
 [<ffffffff81133950>] ? __generic_file_aio_write+0x260/0x490
 [<ffffffff81133c08>] ? generic_file_aio_write+0x88/0x100
 [<ffffffffa053c7a3>] ? gfs2_file_aio_write+0xf3/0x130 [gfs2]
 [<ffffffffa052e525>] ? gfs2_glock_wait+0x25/0x90 [gfs2]
 [<ffffffff810a8cf7>] ? bit_waitqueue+0x17/0xd0
 [<ffffffff8119d270>] ? do_sync_write+0x100/0x140
 [<ffffffffa0530e9d>] ? do_promote+0x21d/0x350 [gfs2]
 [<ffffffffa052e525>] ? gfs2_glock_wait+0x25/0x90 [gfs2]
 [<ffffffff810a8e70>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa052e2d3>] ? gfs2_holder_uninit+0x23/0x40 [gfs2]
 [<ffffffff8124c6ab>] ? selinux_file_permission+0xfb/0x150
 [<ffffffff8123f27c>] ? security_file_permission+0x1c/0x20
 [<ffffffff8119d56a>] ? vfs_write+0xba/0x1a0
 [<ffffffff8119ea66>] ? fget_light_pos+0x16/0x50
 [<ffffffff8119e0a1>] ? sys_write+0x51/0xb0
 [<ffffffff810f0cae>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8155f351>] ? system_call_fastpath+0x2f/0x34

Resolution

Red Hat Enterprise Linux 6
  • The issue (bz1384184) has been resolved with errata RHSA-2018:2846 with the following package(s): kernel-2.6.32-754.6.3.el6 or later.
    #####Red Hat Enterprise Linux 7
  • The issue (bz1608687) has been resolved with errata RHSA-2018:3083 with the following package(s): kernel-3.10.0-957.el7 or later.
Workaround
  • On RHEL 6 boot to a lower version of the kernel that is before kernel-2.6.32-754.el6 then grow the gfs2 file system successfully.
  • Free some space on the gfs2 filesystem before doing the gfs2_grow. Optionally you can run fsck.gfs2 after creating the free space. Make sure that no cluster node has the filesystem mounted when running fsck.gfs2 or corruption will occur. Freeing space only makes issue less likely to occur, and does not prevent it from occurring.

If you have hit this issue and the filesystem is corrupted then please contact Red Hat Support as will require a manual fix of the filesystem. The errata above will only prevent this from occurring in the future, it will not fix the existing corruption.

This issue is documented in the following documentation: Global File System 2: 4.6. Growing a File System

Root Cause

The gfs2_grow utility did not expand the resource group index (rindex) system file properly. As a consequence, it was not possible to add new space to a gfs2 filesystem with gfs2_grow.

This bug usually is triggered when there isn't a single free block on the filesystem. This can also occur on filesystem that are not completely full and depending on the resource groups and their degree of fragmentation, we can still show a large number of free blocks in the file system (but not a large enough span of contiguous free blocks) triggering the issue.

In some instances the failed gfs2_grow can trigger a kernel panic.

Diagnostic Steps

In most cases, if the filesystem has hit this issue the filesystem will throw an error when trying to fix the filesystem with fsck.gfs2 and leaves filesystem in a state where the locking protocol is left enabled on the filesystem. For more information on this then see the following article: Why is my GFS/GFS2 filesystem failing to mount after an interrupted or failed fsck attempt?

Before running fsck.gfs2 it is recommended to check if a panic occurred or if the cluster node was fenced in the middle of a gfs2_grow. If that is the case then running fsck.gfs2 is not recommended.

In addition, the command to save the metadata of the corrupted filesystem will fail when using gfs2_edit savemeta or might succeed with a tiny file produced which does not contain the metadata.


  • Verify that a kernel panic occurred that is similar to the one in the Issues section. Then verify they are running one of the affected kernels described in the Environment section.

  1. Attempt to grow a GFS2 filesystem and observe that the command hangs and the node is eventually fenced.

     # gfs2_grow /sas/data03
     FS: Mount Point: /sas/data03
     FS: Device:      /dev/dm-17
     FS: Size:        52428798 (0x31ffffe)
     FS: RG size:     65533 (0xfffd)
     DEV: Size:       104857600 (0x6400000)
     The file system grew by 204800MB.
     [No further output and no prompt]
    
  2. Check whether the running kernel version is kernel-2.6.32-754.el6 or later.

     # uname -r
     kernel-2.6.32-754.el6
    
  3. Configure kdump so that a vmcore can be captured in the event of a kernel panic.

  4. Configure fence_kdump so that fencing does not interrupt vmcore collection.

  5. Attempt the gfs2_grow again and upload the resulting vmcore to your Red Hat support case.

SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.