What to do if a server fails to boot?

Updated

Common Scenarios

Early Boot Failures.

  • Server displays grub error.
  • Server boots to the grub> command prompt.
  • Server boots to a blinking cursor.
  • Server is missing /boot due to accidental deletion or filesystem checks.

Interim Boot failures.

  • Server displays Kernel panic - not syncing - Attempted to kill init.
  • Server fails to boot due to Out of memory: situation.
  • Server fails to boot after updating.
  • Server hangs during the boot process.
  • Server panic'd and on reboot it now fails to boot.
  • Server stops booting and drops into maintenance/emergency mode.
  • Server constantly reboots during the boot process.
  • Server do not show anything post grub menu and has blank screen.
  • Server stuck at boot with some tracebacks for filesystem corruption.

Troubleshooting Steps

Here to isolate the issue we have divided the booting process in two parts as "Early Boot Failure" and "Interim boot failure"
To troubleshoot and gather the logs below steps can be attempted to see if the server will boot and based on the failure type we can troubleshoot it.

Early Boot Failures.

  • Server displays grub error.
  • Server boots to the grub> command prompt.
  • Server boots to a blinking cursor.
  • Server has /boot contents missing or accidentally deleted.

Try the relative knowledge base article as,

Interim Boot failures.

  • Server displays Kernel panic - not syncing - Attempted to kill init.
  • Server fails to boot due to Out of memory: situation.
  • Server fails to boot after updating.
  • Server hangs during the boot process.
  • Server panic'd and on reboot it now fails to boot.
  • Server stops booting and drops into maintenance/emergency mode.
  • Server constantly reboots during the boot process.
  • Server do not show anything post grub menu and has blank screen.
  • Server stuck at boot with some tracebacks for filesystem corruption.

Try the relative knowledge base article as,

Steps to take before creating support case

Below are a few steps that need to be completed, and the data from the steps provided when opening a case. This information is needed due to the boot process involving several subsystems, and it will help align the case with the correct specialty group.

  • Steps for capturing the screenshot: If the server still doesn't boot and remains hung or shows failed status for services please capture screenshots with more verbose boot messages using How to display more verbose boot-related messages during system startup.. First remove any of the following parameters: quiet or rhgb. Once they have been removed, append the below parameters to the kernel lines to enable booting with debugging. The output will help support pin point where the failure is occurring in the boot process. Please try to capture as many screenshots/pictures of the console while the server is booting as possible. Pictures are worth a thousand words, and will help get the case to the appropriate engineer quicker.

    • For RHEL 6 and below, append the below:
      • debug ignore_loglevel print_fatal_signals=1 printk.time=1 initcall_debug log_buf_len=10M
    • For RHEL 7 and RHEL 8, append the below:
      • rd.debug initcall_debug log_buf_len=10M
  • Steps for capturing the serial console logs: If you can capture serial console logs on the server, those would help us even more, please provide those rather than screenshots/pictures using How does one set up a serial terminal and/or console in Red Hat Enterprise Linux?

  • Capture sosreport from the rescue mode: After capturing any screenshots of the boot process with debugging enabled, support will need an sosreport (yes, an sosreport can be captured if the server isn't booting). The following knowledge article will help walk you through how to do this: How to boot Red Hat Enterprise Linux to Rescue Mode for Data Collection (sosreport, vmcore, etc.)

Category
Article Type