RHCOS nodes are failing to boot with an error "Couldn't find specified OSTree root" in RHOCP 4
Environment
- Red Hat OpenShift Container Platform 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
Issue
- RHCOS nodes are failing to boot with a message
failed to start ostree prepare os. - The node is reporting a
NotReadystatus and the serial console indicates a boot failure with the errorCouldn't find specified OSTree root.
Resolution
Modify the kernel argument to use the correct directory where OSTree stores the bootloader configuration.
Steps to be followed:
- Intercept the GRUB menu.
- Edit the entry by using the e key.
- By default two console parameters are provided in the kernel command line, only keep the one which is required.
- Add
rd.break=pre-mountto the kernel command line. - Press Ctrl+X to resume booting and press enter when prompted in order to start dracut shell.
- Once in the dracut shell, run the
blkidcommand to identify the root disk. - Mount the root device to the
/sysrootdirectory.
# blkid | grep -i root
/dev/vda4: LABEL="root" UUID="4e5a685c-6865-4538-b1d9-4541f390b913" TYPE="xfs" PARTLABEL="root" PARTUUID="a8432eef-31a4-7b42-a1fd-768a79c7c61d"
# mount /dev/vda4 /sysroot ---> replace the device name /dev/vda4 with the one which is obtained from above command
- Then, verify that the correct directory used by
OSTreeto store the bootloader configuration is being referenced.
# ls -l /sysroot/ostree/
total 0
lrwxrwxrwx. 1 root root 8 Jul 10 13:11 boot.1 -> boot.1.1 --> correct one
drwxr-xr-x. 3 root root 19 Jul 10 13:11 boot.1.1
drwxr-xr-x. 3 root root 19 Jul 10 13:11 deploy
drwxr-xr-x. 7 root root 102 Jul 10 13:11 repo
- After identifying the correct directory, exit the dracut shell, then interrupt the GRUB menu during the next boot and modify the kernel arguments to specify the correct
OSTreebootloader configuration directory.
# ostree=/ostree/boot.0/rhcos/<id>/0 --> Update the boot.0 entry to point to the correct boot configuration. In this case, it should be set as follows:
# ostree=/ostree/boot.1/rhcos/<id>/0
Root Cause
The bootloader was referencing a boot.0 entry, which pointed to a directory or path that did not actually exist on the system.
Diagnostic Steps
Verify if the node is failing to boot with an error message similar to the one shown in the attached serial console output.
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.