kdump fails with large ext4 file system because fsck.ext4 gets OOM-killed
Environment
- Red Hat Enterprise Linux (RHEL) 6
- kdump/kexec
- ext4
Issue
kdumpfails with largeext4file system becausefsck.ext4gets OOM-killed- We see the following problem when attempting kdump on a 2.1 TB
ext4file system:
/dev/sda4: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Out of memory: Kill process 1014 (fsck.ext4) score 275 or sacrifice child
Killed process 1014, UID 0, (fsck.ext4) total-vm:135812kB, anon-rss:34860kB, file-rss:948kB
KILL
EXT4-fs error (device sda4): ext4_mb_generate_buddy: EXT4-fs: group 24352: 23917 blocks in bitmap, 24544 in gd
- I see this when kdumping:
Saving to the local filesystem /dev/vdb
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Error allocating block bitmap (1): Memory allocation failed
e2fsck: aborted
EXT4-fs (vdb): mounted filesystem with ordered data mode. Opts:
Resolution
-
Explicitly specifying a site at the
crashkernelparameter which is big enough leads tofsck.ext4finishing. For the 2.1TB filesystem,crashkernel=512Mallocates enough memory. -
The exact required amount of memory is not easy to compute, and dependent on many factors. An estimation of a 16TB filesystem with 4kb block size, 1.88GB of memory would be used.
Root Cause
- The kernel option
crashkernel=autodoes not account for big filesystems, it allocates not enough RAM for thekexeckernel to later runfsck.ext4on the big filesystem.
SBR
Product(s)
Components
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.