RHCOS nodes are failing to boot with an error "Couldn't find specified OSTree root" in RHOCP 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform 4
  • Red Hat Enterprise Linux CoreOS (RHCOS)

Issue

  • RHCOS nodes are failing to boot with a message failed to start ostree prepare os.
  • The node is reporting a NotReady status and the serial console indicates a boot failure with the error Couldn't find specified OSTree root.

Resolution

Modify the kernel argument to use the correct directory where OSTree stores the bootloader configuration.

Steps to be followed:

  • Intercept the GRUB menu.
  • Edit the entry by using the e key.
  • By default two console parameters are provided in the kernel command line, only keep the one which is required.
  • Add rd.break=pre-mount to the kernel command line.
  • Press Ctrl+X to resume booting and press enter when prompted in order to start dracut shell.
  • Once in the dracut shell, run the blkid command to identify the root disk.
  • Mount the root device to the /sysroot directory.
# blkid | grep -i root

/dev/vda4: LABEL="root" UUID="4e5a685c-6865-4538-b1d9-4541f390b913" TYPE="xfs" PARTLABEL="root" PARTUUID="a8432eef-31a4-7b42-a1fd-768a79c7c61d"

# mount /dev/vda4 /sysroot ---> replace the device name /dev/vda4 with the one which is obtained from above command
  • Then, verify that the correct directory used by OSTree to store the bootloader configuration is being referenced.
# ls -l /sysroot/ostree/
total 0
lrwxrwxrwx. 1 root root   8 Jul 10 13:11 boot.1 -> boot.1.1 --> correct one
drwxr-xr-x. 3 root root  19 Jul 10 13:11 boot.1.1
drwxr-xr-x. 3 root root  19 Jul 10 13:11 deploy
drwxr-xr-x. 7 root root 102 Jul 10 13:11 repo
  • After identifying the correct directory, exit the dracut shell, then interrupt the GRUB menu during the next boot and modify the kernel arguments to specify the correct OSTree bootloader configuration directory.
# ostree=/ostree/boot.0/rhcos/<id>/0 --> Update the boot.0 entry to point to the correct boot configuration. In this case, it should be set as follows:

#  ostree=/ostree/boot.1/rhcos/<id>/0

Root Cause

The bootloader was referencing a boot.0 entry, which pointed to a directory or path that did not actually exist on the system.

Diagnostic Steps

Verify if the node is failing to boot with an error message similar to the one shown in the attached serial console output.

Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.