Upgrade to OSC 1.7.0 put running Peer Pods into ContainerCreating status
Environment
- OpenShift Container Platform(OCP) - 4.15, 4.16
- OpenShift sandboxed containers(OSC) operator 1.7
Issue
- Original deployment of Peer Pods with
OSC 1.6.0 - Upgrade to
OSC 1.7.0 - The pods transition from
RunningtoContainerCreatingstatus and are stuck there.
The following error can be seen in the oc describe pod:
Warning FailedCreatePodSandBox 49s (x156 over 35m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: /var/opt/kata/configuration-remote.toml: configuration file contains invalid hypervisor section: "remote": unknown
Resolution
- Delete
kataconfig - Upgrade your cluster to latest z-stream - OCP 4.16.13 or 4.15.43 - to ensure to get
kata-containersrpm3.7.0-2and higher - Recreate
kataconfigto deploy with newest components.
Root Cause
As per This content is not included.KATA-3155 - Remote hypervisor config files needs to be updated to align with upstream changes
and This content is not included.KATA-3193 - Tracker issue for fixes that needs to be in Kata shim for 1.7.0, OSC 1.7 introduces new changes for Confidential Containers that are incompatible with Peer pods from OSC 1.6. This requires to upgrade kata-containers to 3.7.0 or higher and to rebuild the podvm component.
Diagnostic Steps
Other issues that can manifest during this upgrade, in particular for OCP 4.15 customers:
- old shim complains it doesn't understand
kata-remotein the config file - new shim will timeout because Peer Pods still running the old
podvm
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.