Red Hat Build of Kueue (RHBoK) 1.2 installation or upgrade fails with Kueue CRD reconciliation error

Updated

Environment

Red Hat OpenShift AI

  • 2.25 and following

Red Hat Build of Kueue

  • 1.2 and following

Issue

Installation or upgrades from Red Hat Build of Kueue 1.1 to 1.2 fail when cohorts.kueue.x-k8s.io/v1alpha1 or topologies.kueue.x-k8s.io/v1alpha1 Kueue CustomResourceDefinitions (CRDs) exist in the cluster. These CRDs often remain from a previous Kueue installation managed by RHOAI 2.x.

Symptoms

The Kueue Operator logs show a reconciliation error similar to the following:

Unhandled Error err="KueueOperator reconciliation failed:
CustomResourceDefinition.apiextensions.k8s.io "cohorts.kueue.x-k8s.io" is invalid:
status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions"

Resolution

Delete the legacy Kueue CRDs from the cluster to allow the Operator to recreate them using the supported API version.

Prerequisites

  • No active workloads depend on the Kueue resources defined by the v1alpha1 CRDs.
    NOTE
    If any Custom Resource (CR) instances of cohorts.kueue.x-k8s.io/v1alpha1 or topologies.kueue.x-k8s.io/v1alpha1 exist, you must manually back them up, remove them, and convert them to the v1beta1 version before recreating them after the upgrade. Because these are alpha versions, successful conversion is not guaranteed in every scenario.

Procedure

  1. Delete the affected CRDs by using the following commands:
oc delete crd cohorts.kueue.x-k8s.io
oc delete crd topologies.kueue.x-k8s.io

Root cause

The cluster contains legacy Kueue CRDs with the v1alpha1 API version, typically remaining from a previous Kueue installation managed by RHOAI 2.x. These legacy versions contain outdated stored API versions that are incompatible with the version of Kueue included in Red Hat Build of Kueue 1.2. The presence of these incompatible CRDs prevents the Kueue Operator from successfully reconciling the environment.

Category
Tags
Article Type