Workaround for model deployment failure when using hardware profiles
Environment
OpenShift AI 2.23
OpenShift AI 2.24
Issue
Model deployments that use hardware profiles fail because the Red Hat OpenShift AI Operator does not inject the tolerations, nodeSelector, or identifiers from the hardware profile into the underlying InferenceService when manually creating InferenceService resources. As a result, the model deployment pods cannot be scheduled to suitable nodes and the deployment fails to enter a ready state. Workbenches that use the same hardware profile deploy successfully.
Resolution
To resolve this issue, run the following script to manually inject the tolerations, nodeSelector, and identifiers from the hardware profile into the underlying InferenceService.
Replace the HARDWARE_PROFILE_NAME, HARDWARE_PROFILE_NAMESPACE, ISVC_NAME, and ISVC_NAMESPACE values for your environment.
#!/bin/bash
# This script manually injects a HardwareProfile nodeSelector, tolerations, and identifiers into an InferenceService at the correct path.
set -e
HARDWARE_PROFILE_NAME="<HardwareProfile .metadata.name>"
HARDWARE_PROFILE_NAMESPACE="<HardwareProfile .metadata.namespace>"
ISVC_NAME="<InferenceService .metadata.name>"
ISVC_NAMESPACE="<InferenceService .metadata.namespace>"
# Extract nodeSelector from HardwareProfile
NODE_SELECTOR=$(oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" \
-o jsonpath='{.spec.scheduling.node.nodeSelector}')
# Extract tolerations from HardwareProfile
TOLERATIONS=$(oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" \
-o jsonpath='{.spec.scheduling.node.tolerations}')
# Extract identifiers (resources) from HardwareProfile
IDENTIFIERS=$(oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" \
-o jsonpath='{.spec.identifiers}')
# Build the patch JSON for nodeSelector
if [ -n "${NODE_SELECTOR}" ] && [ "${NODE_SELECTOR}" != "{}" ]; then
oc patch inferenceservice "${ISVC_NAME}" -n "${ISVC_NAMESPACE}" --type=merge -p "{\"spec\":{\"predictor\":{\"nodeSelector\":${NODE_SELECTOR}}}}"
fi
# Build the patch JSON for tolerations
if [ -n "${TOLERATIONS}" ] && [ "${TOLERATIONS}" != "null" ]; then
oc patch inferenceservice "${ISVC_NAME}" -n "${ISVC_NAMESPACE}" --type=merge -p "{\"spec\":{\"predictor\":{\"tolerations\":${TOLERATIONS}}}}"
fi
# Build the patch JSON for resources (from identifiers)
if [ -n "${IDENTIFIERS}" ] && [ "${IDENTIFIERS}" != "null" ]; then
# Parse identifiers and build resources object
RESOURCES=$(oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" -o jsonpath='{range .spec.identifiers[*]}{.identifier}{"\t"}{.defaultCount}{"\n"}{end}')
# Build a proper resources patch from identifiers
cat > /tmp/resources-patch.json <<EOF
{
"spec": {
"predictor": {
"model": {
"resources": {
"requests": {
EOF
# Add each identifier as a resource request
oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" \
-o json | jq -r '.spec.identifiers[] | " \"" + .identifier + "\": \"" + (.defaultCount | tostring) + "\","' >> /tmp/resources-patch.json
# Remove trailing comma and close JSON
sed -i '$ s/,$//' /tmp/resources-patch.json
cat >> /tmp/resources-patch.json <<EOF
},
"limits": {
EOF
# Add limits if maxCount is specified
oc get hardwareprofiles.infrastructure.opendatahub.io "${HARDWARE_PROFILE_NAME}" -n "${HARDWARE_PROFILE_NAMESPACE}" \
-o json | jq -r '.spec.identifiers[] | select(.maxCount != null) | " \"" + .identifier + "\": \"" + (.maxCount | tostring) + "\","' >> /tmp/resources-patch.json
# Remove trailing comma and close JSON
sed -i '$ s/,$//' /tmp/resources-patch.json
cat >> /tmp/resources-patch.json <<EOF
}
}
}
}
}
}
EOF
oc patch inferenceservice "${ISVC_NAME}" -n "${ISVC_NAMESPACE}" --type=merge -p "$(cat /tmp/resources-patch.json)"
rm /tmp/resources-patch.json
fi
Root Cause
The Red Hat OpenShift AI Operator does not inject the tolerations, nodeSelector, or identifiers from the hardware profile into the underlying InferenceService when manually creating InferenceService resources.
Diagnostic Steps
- Create a hardware profile with
nodeSelectorandtolerations. - Create the label and taint on the node with GPU so it matches the hardware profile.
- Deploy the model and validate if a GPU is used. The model deployment fails with no nodes available.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.