Upgrade to OSC 1.7.0 put running Peer Pods into ContainerCreating status

Solution Verified - Updated

Environment

  • OpenShift Container Platform(OCP) - 4.15, 4.16
  • OpenShift sandboxed containers(OSC) operator 1.7

Issue

  • Original deployment of Peer Pods with OSC 1.6.0
  • Upgrade to OSC 1.7.0
  • The pods transition from Running to ContainerCreating status and are stuck there.

The following error can be seen in the oc describe pod:

  Warning  FailedCreatePodSandBox  49s (x156 over 35m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: /var/opt/kata/configuration-remote.toml: configuration file contains invalid hypervisor section: "remote": unknown

Resolution

  • Delete kataconfig
  • Upgrade your cluster to latest z-stream - OCP 4.16.13 or 4.15.43 - to ensure to get kata-containers rpm 3.7.0-2 and higher
  • Recreate kataconfig to deploy with newest components.

Root Cause

As per This content is not included.KATA-3155 - Remote hypervisor config files needs to be updated to align with upstream changes
and This content is not included.KATA-3193 - Tracker issue for fixes that needs to be in Kata shim for 1.7.0, OSC 1.7 introduces new changes for Confidential Containers that are incompatible with Peer pods from OSC 1.6. This requires to upgrade kata-containers to 3.7.0 or higher and to rebuild the podvm component.

Diagnostic Steps

Other issues that can manifest during this upgrade, in particular for OCP 4.15 customers:

  • old shim complains it doesn't understand kata-remote in the config file
  • new shim will timeout because Peer Pods still running the old podvm
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.