While testing kdump on RHCOS node, nodes falls in to emergeny shell on reboot.
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
Issue
- After crashing the node manually to test kdump, node fails to boot.
- Node fails to boot with error
Couldn't find specified OSTree root:

Resolution
To rescue the node from boot failure, follow the below solution:
RHCOS nodes are failing to boot with an error "Couldn't find specified OSTree root" in RHOCP 4
Before crashing the node manually to test kdump, please perform the below steps:
-
Check current deployments and ensure only one deployment exists:
rpm-ostree status -vvv -
Clean older deployments manually:
rpm-ostree cleanup -rIf it says
Deployments unchanged, there are no deployments to be removed. -
If the above command fails due a transaction already in progress, wait for it to complete. The pid of the of the process can be found via below command.
ps aux | grep rpm-ostree
Root Cause
- After booting into a new deployment (
boot.0), themachine-config-daemonpod runsrpm-ostree cleanup -rto remove the old rollback deployment(boot.1). If the node is crashed or rebooted while this cleanup is in progress, thebootloadermay not be updated correctly. As a result, GRUB may try to boot from a now-missingboot.1, causing boot failure.
Diagnostic Steps
- Check if the node fails to boot due to
ostree-prepare servicefailure.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.