Upgrading OpenShift AI Self-Managed

Red Hat OpenShift AI Self-Managed 2.25

Upgrade OpenShift AI on OpenShift

Abstract

Upgrade OpenShift AI on OpenShift.

Preface

As a cluster administrator, you can configure either automatic or manual upgrade of the OpenShift AI Operator.

Chapter 1. Overview of upgrading OpenShift AI Self-Managed

As a cluster administrator, you can configure either automatic or manual upgrades for the Red Hat OpenShift AI Operator.

Note

For information about upgrading OpenShift AI as self-managed software on your OpenShift cluster in a disconnected environment, see Upgrading OpenShift AI Self-Managed in a disconnected environment.

  • If you configure automatic upgrades, when a new version of the Red Hat OpenShift AI Operator is available, Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without human intervention.
  • If you configure manual upgrades, when a new version of the Red Hat OpenShift AI Operator is available, OLM creates an update request.

    A cluster administrator must manually approve the update request to update the Operator to the new version. See Manually approving a pending Operator upgrade for more information about approving a pending Operator upgrade.

  • By default, the Red Hat OpenShift AI Operator follows a sequential update process. This means that if there are several minor versions between the current version and the version that you plan to upgrade to, Operator Lifecycle Manager (OLM) upgrades the Operator to each of the minor versions before it upgrades it to the final, target version. If you configure automatic upgrades, OLM automatically upgrades the Operator to the latest available version, without human intervention. If you configure manual upgrades, a cluster administrator must manually approve each sequential update between the current version and the final, target version.

    To view information regarding the supported and tested upgrade paths for Red Hat OpenShift AI, see Red Hat OpenShift AI Upgrade Path Information.

For information about OpenShift AI Self-Managed release types and supported versions, see the Red Hat OpenShift AI Self-Managed Life Cycle Knowledgebase article.

  • Before you upgrade OpenShift AI, you should complete the Requirements for upgrading OpenShift AI.
  • Before you can use an accelerator in OpenShift AI, your instance must have the associated accelerator profile or hardware profile. If your OpenShift instance has an accelerator, its accelerator profile or hardware profile is preserved after an upgrade. For more information about accelerators, see Working with accelerators.

    Important

    By default, hardware profiles are hidden in the dashboard navigation menu and user interface, while accelerator profiles remain visible. In addition, user interface components associated with the deprecated accelerator profiles functionality are still displayed. To show the Settings → Hardware profiles option in the dashboard navigation menu, and the user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift. For more information about setting dashboard configuration options, see Customizing the dashboard.

  • Workbench images are integrated into the image stream during the upgrade and subsequently appear in the OpenShift AI dashboard.

    Note

    Workbench images are constructed externally; they are prebuilt images that undergo quarterly changes and they do not change with every OpenShift AI upgrade.

Chapter 2. Configuring the upgrade strategy for OpenShift AI

As a cluster administrator, you can configure either an automatic or manual upgrade strategy for the Red Hat OpenShift AI Operator.

Important

By default, the Red Hat OpenShift AI Operator follows a sequential update process. This means that if there are several versions between the current version and the version that you intend to upgrade to, Operator Lifecycle Manager (OLM) upgrades the Operator to each of the intermediate versions before it upgrades it to the final, target version. If you configure automatic upgrades, OLM automatically upgrades the Operator to the latest available version, without human intervention. If you configure manual upgrades, a cluster administrator must manually approve each sequential update between the current version and the final, target version.

For information about supported versions, see the Red Hat OpenShift AI Self-Managed Life Cycle Knowledgebase article.

Prerequisites

  • You have cluster administrator privileges for your OpenShift cluster.
  • The Red Hat OpenShift AI Operator is installed.

Procedure

  1. Log in to the OpenShift cluster web console as a cluster administrator.
  2. In the Administrator perspective, in the left menu, select OperatorsInstalled Operators.
  3. Click the Red Hat OpenShift AI Operator.
  4. Click the Subscription tab.
  5. Under Update approval, click the pencil icon and select one of the following update strategies:

    • Automatic: New updates are installed as soon as they become available.
    • Manual: A cluster administrator must approve any new update before installation begins.
  6. Click Save.

Additional resources

Chapter 3. Requirements for upgrading OpenShift AI

When upgrading OpenShift AI, you must complete the following tasks.

Check the components in the DataScienceCluster object

When you upgrade Red Hat OpenShift AI, the upgrade process automatically uses the values from the previous DataScienceCluster object.

After the upgrade, you should inspect the DataScienceCluster object and optionally update the status of any components as described in Updating the installation status of Red Hat OpenShift AI components by using the web console.

Note

New components are not automatically added to the DataScienceCluster object during upgrade. If you want to use a new component, you must manually edit the DataScienceCluster object to add the component entry.

Note

If you are upgrading OpenShift AI on a cluster running in FIPS mode, any custom container images for data science pipelines must be based on UBI 9 or RHEL 9. This ensures compatibility with FIPS-approved pipeline components and prevents errors related to mismatched OpenSSL or GNU C Library (glibc) versions.

Migrate from embedded Kueue to Red Hat build of Kueue

The embedded Kueue component for managing distributed workloads is deprecated. OpenShift AI now uses the Red Hat build of Kueue Operator to provide enhanced workload scheduling for distributed training, workbench, and model serving workloads.

Before upgrading OpenShift AI, check if your environment is using the embedded Kueue component by verifying the spec.components.kueue.managementState field in the DataScienceCluster custom resource. If the field is set to Managed, you must complete the migration to the Red Hat build of Kueue Operator to avoid controller conflicts and ensure continued support for queue-based workloads.

Important

As part of the migration to Red Hat build of Kueue, you must manually delete the following legacy Kueue CRDs:

  • cohorts.kueue.x-k8s.io/v1alpha1
  • topologies.kueue.x-k8s.io/v1alpha1

If you have existing instances of these CRDs, you must manually back up their data, delete the instances, and recreate them using the v1beta1 API after the upgrade. If you do not complete these steps, the Kueue Operator enters a failed reconciliation loop, resulting in a Not Ready status for the DataScienceCluster. To avoid this conflict, ensure no active workloads depend on the legacy Kueue resources.

For more information, see Red Hat Build of Kueue 1.2 installation or upgrade fails with Kueue CRD reconciliation error.

This migration requires OpenShift 4.18 or later. For more information, see Migrating to the Red Hat build of Kueue Operator.

Address KServe requirements

For the KServe component, which is used by the single-model serving platform to serve large models, you must meet the following requirements:

  • To fully install and use KServe, you must also install Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh and perform additional configuration. For more information, see Serving large models.
  • If you want to add an authorization provider for the single-model serving platform, you must install the Red Hat - Authorino Operator. For more information, see Adding an authorization provider for the single-model serving platform.
  • If you have not enabled the KServe component (that is, you set the value of the managementState field to Removed in the DataScienceCluster object), you must also disable the dependent Service Mesh component to avoid errors. See Disabling KServe dependencies.

Address RAG dependencies

If you plan to deploy Retrieval-Augmented Generation (RAG) workloads by using Llama Stack, you must meet the following requirements:

  • You have GPU-enabled nodes available on your cluster and you have installed the Node Feature Discovery Operator and NVIDIA GPU Operator. For more information, see Installing the Node Feature Discovery Operator and Enabling NVIDIA GPUs.
  • You have access to storage for your model artifacts.
  • You have met the KServe installation prerequisites.

Verify Argo Workflows compatibility

If you use your own Argo Workflows instance for pipelines, verify that the installed version is compatible with this release of OpenShift AI. For details, see Supported Configurations.

Update workflows interacting with OdhDashboardConfig resource

Previously, cluster administrators used the groupsConfig option in the OdhDashboardConfig resource to manage the OpenShift groups (both administrators and non-administrators) that can access the OpenShift AI dashboard. Starting with OpenShift AI 2.17, this functionality has moved to the Auth resource. If you have workflows (such as GitOps workflows) that interact with OdhDashboardConfig, you must update them to reference the Auth resource instead.

Table 3.1. User management resource update

 OpenShift AI 2.16 and earlierOpenShift AI 2.17 and later

apiVersion

opendatahub.io/v1alpha

services.platform.opendatahub.io/v1alpha1

kind

OdhDashboardConfig

Auth

name

odh-dashboard-config

auth

Admin groups

spec.groupsConfig.adminGroups

spec.adminGroups

User groups

spec.groupsConfig.allowedGroups

spec.allowedGroups

Check the status of certificate management

You can use self-signed certificates in OpenShift AI.

After you upgrade, check the management status for Certificate Authority (CA) bundles as described in Working with certificates.

Chapter 4. Updating the installation status of Red Hat OpenShift AI components by using the web console

You can use the OpenShift web console to update the installation status of components of Red Hat OpenShift AI on your OpenShift cluster.

Important

If you upgraded OpenShift AI, the upgrade process automatically used the values of the previous version’s DataScienceCluster object. New components are not automatically added to the DataScienceCluster object.

After upgrading OpenShift AI:

  • Inspect the default DataScienceCluster object to check and optionally update the managementState status of the existing components.
  • Add any new components to the DataScienceCluster object.

Prerequisites

  • The Red Hat OpenShift AI Operator is installed on your OpenShift cluster.
  • You have cluster administrator privileges for your OpenShift cluster.

Procedure

  1. Log in to the OpenShift web console as a cluster administrator.
  2. In the web console, click OperatorsInstalled Operators and then click the Red Hat OpenShift AI Operator.
  3. Click the Data Science Cluster tab.
  4. On the DataScienceClusters page, click the default-dsc object.
  5. Click the YAML tab.

    An embedded YAML editor opens showing the default custom resource (CR) for the DataScienceCluster object, similar to the following example:

    apiVersion: datasciencecluster.opendatahub.io/v1
    kind: DataScienceCluster
    metadata:
      name: default-dsc
    spec:
      components:
        codeflare:
          managementState: Removed
        dashboard:
          managementState: Removed
        datasciencepipelines:
          managementState: Removed
        kserve:
          managementState: Removed
        kueue:
          managementState: Removed
        llamastackoperator:
          managementState: Removed
        modelmeshserving:
          managementState: Removed
        ray:
          managementState: Removed
        trainingoperator:
          managementState: Removed
        trustyai:
          managementState: Removed
        workbenches:
          managementState: Removed
          workbenchNamespace: rhods-notebooks
  6. In the spec.components section of the CR, for each OpenShift AI component shown, set the value of the managementState field to either Managed or Removed. These values are defined as follows:

    Managed
    The Operator actively manages the component, installs it, and tries to keep it active. The Operator will upgrade the component only if it is safe to do so.
    Removed
    The Operator actively manages the component but does not install it. If the component is already installed, the Operator will try to remove it.
    Important
  7. Click Save.

    For any components that you updated, OpenShift AI initiates a rollout that affects all pods to use the updated image.

  8. If you are upgrading from OpenShift AI 2.19 or earlier, upgrade the Authorino Operator to the stable update channel, version 1.2.1 or later.

    1. Update Authorino to the latest available release in the tech-preview-v1 channel (1.1.2), if you have not done so already.
    2. Switch to the stable channel:

      1. Navigate to the Subscription settings of the Authorino Operator.
      2. Under Update channel, click on the highlighted tech-preview-v1.
      3. Change the channel to stable.
    3. Select the update option for Authorino 1.2.1.

Verification

  1. Confirm that there is at least one running pod for each component:

    1. In the OpenShift web console, click WorkloadsPods.
    2. In the Project list at the top of the page, select redhat-ods-applications or your custom applications namespace.
    3. In the applications namespace, confirm that there are one or more running pods for each of the OpenShift AI components that you installed.
  2. Confirm the status of all installed components:

    1. In the OpenShift web console, click OperatorsInstalled Operators.
    2. Click the Red Hat OpenShift AI Operator.
    3. Click the Data Science Cluster tab and select the DataScienceCluster object called default-dsc.
    4. Select the YAML tab.
    5. In the status.installedComponents section, confirm that the components you installed have a status value of true.

      Note

      If a component shows with the component-name: {} format in the spec.components section of the CR, the component is not installed.

  3. In the OpenShift AI dashboard, users can view the list of the installed OpenShift AI components, their corresponding source (upstream) components, and the versions of the installed components, as described in Viewing installed OpenShift AI components.

Legal Notice

Copyright © Red Hat.
Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.
The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.
All other trademarks are the property of their respective owners.