Red Hat Build of Kueue (RHBoK) 1.2 installation or upgrade fails with Kueue CRD reconciliation error
Environment
Red Hat OpenShift AI
- 2.25 and following
Red Hat Build of Kueue
- 1.2 and following
Issue
Installation or upgrades from Red Hat Build of Kueue 1.1 to 1.2 fail when cohorts.kueue.x-k8s.io/v1alpha1 or topologies.kueue.x-k8s.io/v1alpha1 Kueue CustomResourceDefinitions (CRDs) exist in the cluster. These CRDs often remain from a previous Kueue installation managed by RHOAI 2.x.
Symptoms
The Kueue Operator logs show a reconciliation error similar to the following:
Unhandled Error err="KueueOperator reconciliation failed:
CustomResourceDefinition.apiextensions.k8s.io "cohorts.kueue.x-k8s.io" is invalid:
status.storedVersions[0]: Invalid value: "v1alpha1": must appear in spec.versions"
Resolution
Delete the legacy Kueue CRDs from the cluster to allow the Operator to recreate them using the supported API version.
Prerequisites
- No active workloads depend on the Kueue resources defined by the
v1alpha1CRDs.
NOTE
If any Custom Resource (CR) instances ofcohorts.kueue.x-k8s.io/v1alpha1ortopologies.kueue.x-k8s.io/v1alpha1exist, you must manually back them up, remove them, and convert them to thev1beta1version before recreating them after the upgrade. Because these are alpha versions, successful conversion is not guaranteed in every scenario.
Procedure
- Delete the affected CRDs by using the following commands:
oc delete crd cohorts.kueue.x-k8s.io
oc delete crd topologies.kueue.x-k8s.io
Root cause
The cluster contains legacy Kueue CRDs with the v1alpha1 API version, typically remaining from a previous Kueue installation managed by RHOAI 2.x. These legacy versions contain outdated stored API versions that are incompatible with the version of Kueue included in Red Hat Build of Kueue 1.2. The presence of these incompatible CRDs prevents the Kueue Operator from successfully reconciling the environment.