How to resolve scheduled pipeline run failures after upgrading to Red Hat OpenShift AI 2.14.0
Environment
Red Hat OpenShift AI (RHOAI) 2.14.0
Issue
Scheduled pipeline runs might fail after upgrading to Red Hat OpenShift AI (RHOAI) 2.14.0 due to a known issue (This content is not included.RHOAIENG-14265).
Resolution
To resolve failed scheduled runs, delete the failed instance, then recreate or duplicate it. For automated recreation, refer to the process below.
Automated process
This process is only valid when upgrading from Red Hat OpenShift AI 2.13.0 (RHOAI) to Red Hat OpenShift AI (RHOAI) 2.14.0.
Prerequisites
- You have installed Content from jqlang.github.io is not included.jq.
- You have installed Content from mikefarah.gitbook.io is not included.yq.
- You have admin permissions for the data science project that contains the
ScheduledWorkflowinstances.
Script to patch scheduled workflows
Open a terminal that contains a bash shell environment with access to your OpenShift cluster and run the following script. This script automates updating ScheduledWorkflow instances with new driver and launcher images:
#!/usr/bin/env bash
set -e
OLD_DRIVER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-driver-rhel8@sha256:16a711ba5c770c3b93e9a5736735f972df9451a9a1903192fcb486aa929a44b7"
NEW_DRIVER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-driver-rhel8@sha256:78d5f5a81a3f0ee0b918dc2dab7ffab5b43fec94bd553ab4362f2216eef39688"
OLD_LAUNCHER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-launcher-rhel8@sha256:e8aa5ae0a36dc50bdc740d6d9753b05f2174e68a7edbd6c5b0ce3afd194c7a6e"
NEW_LAUNCHER_IMAGE="registry.redhat.io/rhoai/odh-ml-pipelines-launcher-rhel8@sha256:3a3ba3c4952dc9020a8a960bdd3c0b2f16ca89ac15fd17128a00c382f39cba81"
NAMESPACE=""
while [[ "$#" -gt 0 ]]; do
case $1 in
--namespace) NAMESPACE="$2"; shift ;;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done
if [ -z "${NAMESPACE}" ]; then
echo "Error: --namespace parameter is required."
echo "Usage: $0 --namespace <namespace>"
exit 1
fi
patch_image() {
local workflow_spec=$1
local old_image=$2
local new_image=$3
local patched_workflow_spec
patched_workflow_spec=$(jq --arg OLD_IMAGE "${old_image}" --arg NEW_IMAGE "${new_image}" '
(.. | objects | select(.image == $OLD_IMAGE) | .image) |= $NEW_IMAGE
' <<< "$workflow_spec")
echo "${patched_workflow_spec}"
}
add_arguments() {
local workflow_spec=$1
local driver_image=$2
local dspa=$3
local new_args
local server_address
local port
port=$(oc get service ds-pipeline-metadata-grpc-"${dspa}" -o jsonpath='{.spec.ports[*].port}' -n "${NAMESPACE}")
server_address="ds-pipeline-metadata-grpc-${dspa}.${NAMESPACE}.svc.cluster.local"
new_args="[
\"--mlmd_server_address\", \"${server_address}\",
\"--mlmd_server_port\", \"${port}\",
\"--metadataTLSEnabled\", \"true\"
]"
updated_json=$(jq --arg image "${driver_image}" --argjson new_args "$new_args" '
.spec.templates[].container |= if .image == $image then
if (.args | index("--mlPipelineServiceTLSEnabled") as $i | if $i then .[$i + 1] == "true" else true end) then
.args += $new_args
else
.
end
else
.
end
' <<< "${workflow_spec}")
echo "$updated_json"
}
patch_swf() {
local swf_name=$1
local workflow_spec
workflow_spec=$(oc get -oyaml swf "${swf_name}" -n "${NAMESPACE}" | yq .spec.workflow.spec)
workflow_spec=$(patch_image "${workflow_spec}" "${OLD_DRIVER_IMAGE}" "${NEW_DRIVER_IMAGE}")
workflow_spec=$(patch_image "${workflow_spec}" "${OLD_LAUNCHER_IMAGE}" "${NEW_LAUNCHER_IMAGE}")
dspa=$(oc get swf "${swf_name}" -o yaml -n "${NAMESPACE}" | yq '.metadata.ownerReferences[] | select(.kind == "DataSciencePipelinesApplication") | .name')
workflow_spec=$(add_arguments "${workflow_spec}" "${NEW_DRIVER_IMAGE}" "${dspa}")
workflow_spec=$(echo -n "${workflow_spec}" | jq -c | jq -Rsa)
oc patch swf "${swf_name}" --type=merge -p "{\"spec\":{\"workflow\":{\"spec\": $workflow_spec}}}" -n "${NAMESPACE}"
}
main() {
local swf_names
local workflow_spec
swf_names=$(oc get swf --no-headers -o custom-columns=":metadata.name" -n "${NAMESPACE}")
for swf_name in $swf_names; do
echo "Processing Scheduled Workflow: $swf_name"
workflow_spec=$(patch_swf "${swf_name}")
echo "Scheduled Workflow successfully patched: $swf_name"
done
}
main
Run the Script
To patch ScheduledWorkflow instances, pass the target namespace as an argument when running the script. For example, to patch instances in the dspa-example1 namespace, use:
./patch_swf.sh --namespace dspa-example1
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.