How to fix can't boot issue caused due to misconfigured hugepages resulting in oom during bootup without using ISO?

Solution Verified - Updated

Environment

  • Red Hat Enterprise Linux 7.
  • Red Hat Enterprise Linux 8.
  • Red Hat Enterprise Linux 9.

Issue

  • How to fix can't boot issue caused due to misconfigured hugepages resulting in oom during bootup without using ISO?
  • System is unable to boot due to out of memory caused by misconfigured hugepages value. Is there any way to fix without booting with rescue iso?

Resolution

SCENARIO 1 - Hugepages set in kernel command line

  1. If hugepages are set in kernel commandline, then follow below steps

    1. Boot the system and stop at grub menu.
    2. Press e to edit the entry and remove hugepages entry.
    3. Press ctrl+x to continue the boot.
    4. Once system boots up, calculate right amount of hugepages with respect to total RAM and modify grub.
  2. For example, If system has 10GiB of RAM, the value of nr_hugepages should not surpass 4500 if hugepage size is 2MiB.

  3. Total calculation will come around 8.78GiB which is within 10GiB.

     # echo 2*4500/2^10|bc -l
     8.78906250000000000000
    

SCENARIO 2 - Hugepages allocated in configuration file

STEPS FOR RHEL 7
  1. Reboot the problematic system and stop at grub menu.

  2. Edit the kernel entry using e.

  3. Scroll to the line starting with linux16 or linuxefi.

  4. Add following parameters

     systemd.mask=systemd-sysctl.service rd.systemd.unit=emergency.target systemd.mask=tuned.service
    
  5. Press ctrl+x to continue the boot

  6. Since we have added parameter rd.systemd.unit=emergency.target, system will end up in dracut shell, but before the step of applying sysctl.

     [    2.906661] systemd[1]: Started Emergency Shell.
     [  OK  ] Started Emergency Shell.
     [    2.909506] systemd[1]: Reached target Emergency Mode.
     [  OK  ] Reached target Emergency Mode.
     Generating "/run/initramfs/rdsosreport.txt"
     Entering emergency mode. Exit the shell to continue.
     Type "journalctl" to view system logs.
     You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
     after mounting them and attach it to a bug report.
     :/# 
    
  7. Here we need to remove hugepages entry from either of following locations using vi editor.

     /etc/sysctl.conf
     /etc/sysctl.d/*.conf
     /usr/lib/sysctl.d/*.conf
     /lib/sysctl.d/*.conf
     /usr/local/lib/sysctl.d/*.conf
    
  8. If hugepages are defined in any other custom location/path, then comment or remove that entry.

  9. Following command can also be used to identify files which has nr_hugepages entry if hugepages are set from some unknown location.

     # grep -r nr_hugepages /etc/* -l
    
  10. Above command can be repeated for rest of locations under /.

  11. Once edits are made, the dracut session prompt can be exited using exit command.

  12. By this method we have bypassed hugepages to be applied in the initramfs stage and with parameter, systemd.mask=systemd-sysctl.service we will tell systemd not to apply hugepages after switch root

  13. Once system boots up completely, perform below steps.

    1. Set hugepages value to zero echo 0 > /proc/sys/vm/nr_hugepages.
    2. Modify the config file which has incorrect hugepages setting or comment hugepages entry.
    3. Rebuild initramfs so that system can boot next time without any issue.
    4. Perform reboot and verify.
STEPS FOR RHEL 8 and RHEL 9
  1. Perform reboot and stop at grub menu and add following kernel parameters.

     systemd.mask=systemd-sysctl.service rd.systemd.mask=systemd-sysctl.service systemd.mask=tuned.service
    
  2. Once system boots up, fix the hugepages entry and Rebuild initramfs.

  3. Perform reboot and verify.

NOTE

  • Here we are also masking tuned as its known to reapply sysctl's present under /etc/sysctl.conf or /etc/sysctl.d/ directory.

Root Cause

  • The value of hugepages had surpassed total amount of RAM installed which caused system unable to boot and resulted into out of memory.
  • Steps described in resolution can help to bootup the system without booting it with rescue iso image.
SBR
Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.