How to resolve scheduled pipeline run failures after upgrading to Red Hat OpenShift AI 2.14.0

Solution Unverified - Updated

Environment

Red Hat OpenShift AI (RHOAI) 2.14.0

Issue

Scheduled pipeline runs might fail after upgrading to Red Hat OpenShift AI (RHOAI) 2.14.0 due to a known issue (This content is not included.RHOAIENG-14265).

Resolution

To resolve failed scheduled runs, delete the failed instance, then recreate or duplicate it. For automated recreation, refer to the process below.

Automated process

This process is only valid when upgrading from Red Hat OpenShift AI 2.13.0 (RHOAI) to Red Hat OpenShift AI (RHOAI) 2.14.0.

Prerequisites

Script to patch scheduled workflows

Open a terminal that contains a bash shell environment with access to your OpenShift cluster and run the following script. This script automates updating ScheduledWorkflow instances with new driver and launcher images:

#!/usr/bin/env bash

set -e

OLD_DRIVER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-driver-rhel8@sha256:16a711ba5c770c3b93e9a5736735f972df9451a9a1903192fcb486aa929a44b7"
NEW_DRIVER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-driver-rhel8@sha256:78d5f5a81a3f0ee0b918dc2dab7ffab5b43fec94bd553ab4362f2216eef39688"

OLD_LAUNCHER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-launcher-rhel8@sha256:e8aa5ae0a36dc50bdc740d6d9753b05f2174e68a7edbd6c5b0ce3afd194c7a6e"
NEW_LAUNCHER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-launcher-rhel8@sha256:3a3ba3c4952dc9020a8a960bdd3c0b2f16ca89ac15fd17128a00c382f39cba81"

NAMESPACE=""

while [[ "$#" -gt 0 ]]; do
    case $1 in
        --namespace) NAMESPACE="$2"; shift ;;
        *) echo "Unknown parameter passed: $1"; exit 1 ;;
    esac
    shift
done

if [ -z "${NAMESPACE}" ]; then
    echo "Error: --namespace parameter is required."
    echo "Usage: $0 --namespace <namespace>"
    exit 1
fi

patch_image() {
    local workflow_spec=$1
    local old_image=$2
    local new_image=$3
    local patched_workflow_spec

    patched_workflow_spec=$(jq --arg OLD_IMAGE "${old_image}" --arg NEW_IMAGE "${new_image}" '
      (.. | objects | select(.image == $OLD_IMAGE) | .image) |= $NEW_IMAGE
    ' <<< "$workflow_spec")

    echo "${patched_workflow_spec}"
}

add_arguments() {
    local workflow_spec=$1
    local driver_image=$2
    local dspa=$3

    local new_args
    local server_address
    local port

    port=$(oc get service ds-pipeline-metadata-grpc-"${dspa}" -o jsonpath='{.spec.ports[*].port}' -n "${NAMESPACE}")

    server_address="ds-pipeline-metadata-grpc-${dspa}.${NAMESPACE}.svc.cluster.local"

    new_args="[
        \"--mlmd_server_address\", \"${server_address}\",
        \"--mlmd_server_port\", \"${port}\",
        \"--metadataTLSEnabled\", \"true\"
    ]"

    updated_json=$(jq --arg image "${driver_image}" --argjson new_args "$new_args" '
      .spec.templates[].container |= if .image == $image then
          if (.args | index("--mlPipelineServiceTLSEnabled") as $i | if $i then .[$i + 1] == "true" else true end) then
              .args += $new_args
          else
              .
          end
        else
          .
        end
    ' <<< "${workflow_spec}")

    echo "$updated_json"
}

patch_swf() {
    local swf_name=$1

    local workflow_spec

    workflow_spec=$(oc get -oyaml swf "${swf_name}" -n "${NAMESPACE}" | yq .spec.workflow.spec)
    workflow_spec=$(patch_image "${workflow_spec}" "${OLD_DRIVER_IMAGE}" "${NEW_DRIVER_IMAGE}")
    workflow_spec=$(patch_image "${workflow_spec}" "${OLD_LAUNCHER_IMAGE}" "${NEW_LAUNCHER_IMAGE}")

    dspa=$(oc get swf "${swf_name}" -o yaml -n "${NAMESPACE}" | yq '.metadata.ownerReferences[] | select(.kind == "DataSciencePipelinesApplication") | .name')

    workflow_spec=$(add_arguments "${workflow_spec}" "${NEW_DRIVER_IMAGE}" "${dspa}")

    workflow_spec=$(echo -n "${workflow_spec}" | jq -c | jq -Rsa)

    oc patch swf "${swf_name}" --type=merge -p "{\"spec\":{\"workflow\":{\"spec\": $workflow_spec}}}" -n "${NAMESPACE}"
}

main() {
    local swf_names
    local workflow_spec

    swf_names=$(oc get swf --no-headers -o custom-columns=":metadata.name" -n "${NAMESPACE}")

    for swf_name in $swf_names; do
        echo "Processing Scheduled Workflow: $swf_name"

        workflow_spec=$(patch_swf "${swf_name}")

        echo "Scheduled Workflow successfully patched: $swf_name"
    done
}

main

Run the Script

To patch ScheduledWorkflow instances, pass the target namespace as an argument when running the script. For example, to patch instances in the dspa-example1 namespace, use:

./patch_swf.sh --namespace dspa-example1
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.