Install Red Hat AI Enterprise using Azure Marketplace

Updated

Run Red Hat AI Enterprise (RHAIE) through the Microsoft Azure Marketplace to create self-managed RHAIE cluster deployments. Your deployments are billed on a pay-per-use basis with your Azure subscription but are still supported by Red Hat directly.

RHAIE provides a custom environment for developing and deploying AI-driven applications and includes these products:

  • Red Hat OpenShift AI (RHOAI)
  • Self-managed Red Hat OpenShift Container Platform (OCP)
  • AI Accelerator Entitlements

How the installation process works

Installing RHAIE on Azure has two parts:

  1. Set up an OpenShift Container Platform cluster using Azure Marketplace
  2. Install the Red Hat OpenShift AI Operator

Deployments of Red Hat OpenShift Container Platform (OCP) on Azure Marketplace are similar to self-managed installations, and you should already have experience in the Azure environment. But installing RHAIE requires a customized installation to enable billing integration.

Additional resources

Supported and unsupported installations

Both installer-provisioned (IPI) and user-provisioned infrastructure (UPI) scenarios are supported.

Red Hat AI Enterprise on Azure Marketplace does not support the following scenarios:

  • Single-node deployments. These are not supported for Azure Marketplace billing and are not supported as an RHOAI production topology.
  • Three-node deployments. Compact clusters are not supported for Azure Marketplace billing, and RHOAI requires dedicated worker capacity.
  • Disconnected or air-gapped clusters. For a disconnected RHOAI install, see the official disconnected installation guide.

Set up an OpenShift Container Platform cluster using Azure Marketplace

The OpenShift Container platform cluster provides the infrastructure foundation for your Red Hat AI Enterprise deployment.

Prerequisites

Select the OpenShift Marketplace image in Azure

Use the Azure CLI to find and select the OpenShift Marketplace image for your deployment. Selecting the correct base image is critical for performance and compatibility with your hardware.

Prerequisites

Procedure

  1. In the Azure CLI, display all available OpenShift images.
$ az vm image list --all --offer rh-rhaie --publisher redhat --output table
  1. Select the image version for the OCP minor release that you are targeting and use it consistently throughout the installation.

Example Red Hat Core Operating System (RHCOS) image

$ az vm image show --urn redhat-rhel:rh-rhaie:rh-rhaie-3-gen2:latest

Note:
The SKUs used in this example are for Generation 2 VM images. The default instance types used in OpenShift are Gen2-compatible. To optimize performance and compatibility, use Gen2 images with GPU-capable instance types. Do not use Gen1 images.

  1. Review and accept the usage terms of the image using the Azure CLI.
  • Review the terms for the image offering:
$ az vm image terms show --urn redhat-rhel:rh-rhaie:rh-rhaie-3-gen2:latest
  • Accept the terms for this image offering:
$ az vm image terms accept --urn redhat-rhel:rh-rhaie:rh-rhaie-3-gen2:latest

Specify Marketplace Images in the installation configuration

Deploy the cluster in stages to support capacity and troubleshooting. If a network or cloud setup error occurs, you can fix it on its own without having to know which errors are caused by GPU-specific issues:

  1. Build the base cluster first and focus only on standard, non-GPU worker nodes:
  2. After the core environment is running without problems, add GPU-capable compute pools:

Create a configuration file for the base cluster with non-GPU worker nodes

Specify the Azure Marketplace image details directly in the install-config.yaml file to set up pay-per-use billing automatically. Doing this removes the need to change machine settings manually when the cluster starts.

Procedure

  • To specify Marketplace images in the installconfig, edit installconfig.compute.platform.azure.osImage to look like this sample and save your changes.

Example install-config.yaml.template file
This example assumes you do not have unconditional User Access Administrator rights, so platform.azure.defaultMachinePlatform.identity.type: None is part of the install-config.

---
apiVersion: v1
baseDomain: <your_base_domain>
compute:
  - hyperthreading: Enabled
    name: worker
    platform:
      azure:
        type: Standard_D8s_v3
        osImage:
          publisher: redhat-rhel
          offer: rh-rhaie
          sku: rh-rhaie-3-gen2
          version: 9.6.2026030314
    replicas: 4
controlPlane:
  - hyperthreading: Enabled
    name: master
    platform:
      azure:
        type: Standard_D8s_v3
        osImage:
          publisher: redhat-rhel
          offer: rh-rhaie
          sku: rh-rhaie-3-gen2
          version: 9.6.2026030314
    replicas: 3
metadata:
  name: <your_metadata_name>
networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
  machineNetwork:
    - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
    - 172.30.0.0/16
platform:
  azure:
    baseDomainResourceGroupName: <your_base_Domain_Resource_Group_Name>
    region: <your_azure_region>
    resourceGroupName: <your_azure_resource_group_name>
    cloudName: AzurePublicCloud
    defaultMachinePlatform:
      identity:
        type: None
publish: External
pullSecret: <'YOUR_PULL_SECRET_HERE'>
sshKey: |
    <your_ssh_key>

where:

<your_base_domain>
Main domain name you own for routing traffic to the cluster, such as example.com.
<your_metadata_name>
Unique name you choose for your cluster, for example, rhaie-prod.
<your_base_Domain_Resource_Group_Name>
Name of the Azure resource group that holds the DNS settings for your domain, for example, dns-zones-rg.
<your_azure_region>
Short code for the Azure data center location where your servers are located, for example, eastus.
<your_azure_resource_group_name>
Name of a new, empty Azure resource group where your cluster servers are stored, for example, rhaie-cluster-rg.
<'YOUR_PULL_SECRET_HERE'>
Official authorization text from Red Hat that lets your cluster download software. Use single quotes around the text.
<your_ssh_key>
Security key pattern that lets you securely log in to the backend cluster nodes to fix problems, for example, ssh-rsa AAAAB3….

Create an OpenShift cluster by using this configuration file

To begin the automated deployment that creates the cluster, use the install-config.yaml.template file. OpenShift automatically manages your machines using MachineSets. These are configuration files that OpenShift uses to provision and manage virtual machines in Azure with specific hardware.

Always use a new resource group name for each installation attempt to avoid tag pollution from failed installations.

Procedure

  • Create your clusters with the following command:
$ openshift-install create cluster --dir <installation_dir>

The installation might take 45 minutes or longer.

Verify that your cluster is running and stable

Before installing GPU-capable compute pools, verify that your cluster is running with the non-GPU compute nodes that use the Azure Marketplace images. These clusters are used and billed through Azure Marketplace.

Procedure

  1. Log in to the cluster:
    • On the OpenShift console, navigate to your login ID, for example, user:admin, and click it.
    • On the dropdown menu, click Copy login command.
    • Click Display token.
    • Log in with the token that is displayed.

Example login

$ oc login --token=sha256~AbCdEf123456XyZ789012VwXyZ345678lqJofhxgvV4 --server=https://api.cluster-xyz.example.com:6443
  1. Verify that the cluster is stable and healthy:
$ oc get clusterversion
$ oc get nodes
$ oc get co

If the cluster is stable, all ClusterOperators report AVAILABLE=True, PROGRESSING=False, and DEGRADED=False.

Troubleshoot cluster problems

If your installation fails or the cluster does not report a healthy status, see the following resources:

Create a GPU-capable compute pool for your cluster

To support RHOAI accelerated workloads, you must create a GPU-capable compute pool. To add this pool, create a new MachineSet configuration file that targets a GPU-enabled SKU, such as Standard_NC24ads_A100_v4.

Prerequisites

  • You have installed a Red Hat OpenShift base cluster.
  • You have verified that your Azure subscription has enough vCPU quota for the NCSv4-series family. Because Standard_NC24ads_A100_v4 is a newer VM type, a Generation 2 Marketplace SKU is required.

Procedure

  1. Identify an existing MachineSet to use as a template for your new MachineSet.
$ oc get machinesets -n openshift-machine-api
  1. Export an existing MachineSet to a YAML file to use as a template. This ensures your networking and cluster metadata are correct.
$ oc get machineset <existing_machineset_name> -n openshift-machine-api -o yaml > gpu-machineset.yaml

Where <existing_machineset_name> specifies the MachineSet to use as a base template

  1. Modify the YAML by updating the name, the vmSize to a GPU SKU, for example, Standard_NC24ads_A100_v4, and the Marketplace image details to ensure billing integration.

Example MachineSet YAML for Azure GPU Nodes

This template uses the Standard_NC24ads_A100_v4 SKU and includes the required Azure Marketplace image information for Red Hat AI Enterprise billing.

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: <cluster_id> 
  name: <cluster_id>-gpu-worker-<azure_region>
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: <cluster_id>
      machine.openshift.io/cluster-api-machineset: <cluster_id>-gpu-worker-<azure_region>
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <cluster_id>
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: <cluster_id>-gpu-worker-<azure_region>
    spec:
      providerSpec:
        value:
          apiVersion: azureproviderconfig.openshift.io/v1beta1
          kind: AzureMachineProviderSpec
          location: <azure_region>
          vmSize: Standard_NC24ads_A100_v4 # GPU-capable SKU 
          image:
            publisher: redhat-rhel # Marketplace publisher 
            offer: rh-rhaie # Marketplace offer 
            sku: rh-rhaie-3-gen2 # Marketplace SKU
            version: 9.6.2026030314 # Marketplace version 
          # Ensure other fields like networkResourceGroup and vnet match your cluster 

where:
<cluster_id>
Specifies your existing cluster

<azure_region>
Specifies your cluster’s geographical region, for example, eastus (East US / Virginia)

  1. Create the new MachineSet in the OpenShift CLI:
$ oc apply -f gpu-machineset.yaml

Verify that the cluster is running with the GPU compute nodes

Before installing RHOAI, verify that your cluster is running with the GPU compute nodes that use the Azure Marketplace images.

  1. Verify that the new machines are provisioning correctly:
$ oc get machines -n openshift-machine-api
  1. Confirm the new node has successfully joined the cluster:
$ oc get nodes

Your GPU-capable machine should be listed as Ready. The next step is to install RHOAI and the Operators that support it.

Install RHOAI and its dependencies

When your OpenShift cluster is running and stable, you are ready to prepare the cluster to run Red Hat OpenShift AI (RHOAI).

Prerequisites

  • You have a GPU compute node installed on your cluster.
  • You have the resources for RHOAI components. See the RHOAI release notes for the version that you are installing.

Install required Operators by using the OpenShift console

Before configuring model servers or data science workbenches, you must prepare your OpenShift Container Platform cluster by installing foundational Operators from OperatorHub. These utilities enable essential service mesh, serverless, and hardware-detection frameworks. These are included at no additional cost beyond your standard Azure compute fees.

⚠️ IMPORTANT:
To resolve all dependencies, you must install the Operators in a specific sequence, and non-GPU Operators must be fully deployed before you install GPU Operators.

Procedure

  1. In the OpenShift web console, navigate to Ecosystem → Software Catalog to add the non-GPU Operators. For each Operator, click Install, use the default installation settings, and click Install again.

    ⚠️ IMPORTANT: To avoid configuration errors, install these exact Operators in the following order:

    1. Red Hat OpenShift Service Mesh 3
    2. Red Hat OpenShift Serverless
    3. cert-manager Operator for Red Hat OpenShift
    4. Red Hat Connectivity Link
    5. Red Hat build of Leader Worker Set
    6. Red Hat build of Kueue
    7. Job Set Operator
  2. After the non-GPU Operators are ready, install the following GPU Operators by navigating to Ecosystem → Software Catalog and clicking Install. Use the default settings, and click Install again:

    • Node Feature Discovery Operator (NFD)
    • NVIDIA GPU Operator

Verify that the Operators are successfully installed

Before you install RHOAI, ensure that your Operators are installed correctly. You can check in the OpenShift Console or the OpenShift CLI.

GUI procedure

  1. In the OpenShift console, go to OperatorHub, and click the Project menu.
  2. Toggle on the Show default projects switch**,** and select All Projects.
  3. Click Ecosystem → Installed Operators.
  4. Check the Operator Status in the table, or search for each Operator by name.
    If the installation was successful, the Operators are displayed in the list of Operators, and their status is Succeeded.

CLI procedure

  • Verify that each Operator has been installed successfully with the following command:
$ oc get csv -A | grep -E 'servicemesh|serverless|cert-manager|connectivity|leader-worker-set|kueue|jobset|nfd|gpu-operator'

Check that all Operators have status Succeeded.

⚠️ IMPORTANT Do not start installing RHOAI until all Operators show Succeeded. If any Operator remains in a Pending state, check the underlying namespace event logs to verify that cluster quotas have not been exceeded.

Install RHOAI and its components

You have installed the foundational Operators for RHOAI. When you install RHOAI, it automatically installs the additional components that it needs to run.

Procedure

  1. In the OpenShift OperatorHub, navigate to Ecosystem → Software Catalog.

  2. Search for OpenShift AI.

  3. If multiple tiles are displayed, find this exact tile Red Hat OpenShift AI Provided by Red Hat, Inc., and click it.
    ![][image1]

  4. In the Channel field, select stable-3.x.

  5. For Version, select 3.4.0 or the latest version.

  6. Keep the default values for Installation mode and Installed Namespace (redhat-ods-operator).

  7. Click Install.

  8. If you have not created the Data Science Cluster already, click Create DataScienceCluster when the button is active. Click Create again.

    The DataScienceCluster Initialization (DSCI) YAML file is created automatically. The DataScienceCluster YAML file is displayed.

  9. Edit the DSC YAML file as needed. For example, if you want to add Llama Stack to your Data Science Cluster, change Removed to Managed and click Save.

Example section of the DSC YAML file

  spec:
    trainer:
      managementState: Managed
    llamastackoperator:
      managementState: Managed
    trainingoperator:
      managementState: Removed

Completing the installation might take a minute or longer depending on your environment.

Verification
When RHOAI and its components are completely installed, RHOAI has the status Succeeded on the OperatorHub and the DataScienceCluster has the status Ready.

  1. To verify the RHOAI status, click Ecosystem → Installed Operators.

    The RHOAI status should be Succeeded.

  2. Click the link for Red Hat OpenShift AI.

    The Provided APIs for RHOAI are displayed as tiles.

  3. To verify that the Data Science Cluster is running, click the DataScienceCluster tab.
    The DataScienceCluster should show Phase: Ready in the Status column.

  4. To see the details for the Data Science Cluster, click the default-dsc link.

Launch RHOAI

You are ready to launch RHOAI. Begin building, training, testing, and deploying both predictive and generative AI models across hybrid cloud environments.

Procedure

  1. From the OpenShift console, click the Applications grid icon. !alt text

  2. Under OpenShift Self Managed Services, click Red Hat OpenShift AI, and log in.
    alt text

Category
Article Type