Installing Red Hat AI Enterprise on GCP Marketplace

Updated

Deploy Red Hat AI Enterprise (RHAIE) through the Google Cloud Platform (GCP) Cloud Marketplace to create self-managed RHAIE cluster deployments. Your deployments are billed on a pay-per-use basis with your Google Cloud subscription but are still supported by Red Hat directly.

RHAIE provides a custom environment for developing and deploying AI-driven applications and includes these products:

  • Red Hat OpenShift AI (RHOAI)
  • Self-managed Red Hat OpenShift Container Platform (OCP)
  • AI Accelerator Entitlements

How the installation process works

Installing RHAIE on GCP has two parts:

  1. Set up an OpenShift Container Platform cluster using GCP Marketplace
  2. Install the Red Hat OpenShift AI Operator

Deployments of Red Hat OpenShift Container Platform (OCP) on GCP Marketplace are similar to self-managed installations, and you should already have experience in the Google Cloud environment. But installing RHAIE requires a customized installation to enable billing integration.

Additional resources

Supported and unsupported installations

Both installer-provisioned (IPI) and user-provisioned infrastructure (UPI) scenarios are supported.

Red Hat AI Enterprise on GCP Marketplace does not support the following scenarios:

  • Single-node deployments. These are not supported for GCP Marketplace billing and are not supported as an RHOAI production topology.
  • Three-node deployments. Compact clusters are not supported for GCP Marketplace billing, and RHOAI requires dedicated worker capacity.
  • Disconnected or air-gapped clusters. GCP Marketplace billing requires outbound reachability to GCP metering endpoints. For a disconnected RHOAI install, see the official disconnected installation guide.

Set up an OpenShift Container Platform cluster using GCP Marketplace

The OpenShift Container Platform cluster provides the infrastructure foundation for your Red Hat AI Enterprise deployment.

Prerequisites for installing OCP on GCP

Select the OpenShift Marketplace image in GCP

Use the Google Cloud CLI to find and select the OpenShift Marketplace image for your deployment. Selecting the correct base image is critical for performance and compatibility with your hardware.

Prerequisites

Procedure

  1. In the Google Cloud CLI, display the path configurations for available OpenShift Marketplace images:
$ gcloud compute images list --project=redhat-marketplace-public --no-standard-images
  1. Select the image string path for the OCP minor release that you are targeting and use it consistently throughout the installation. You use that image string path directly in your cluster configuration file.

Example Red Hat Enterprise Linux CoreOS (RHCOS) marketplace path reference:

projects/redhat-marketplace-public/global/images/redhat-coreos-ocp-419-x86-64-latest

Specify Marketplace Images in the installation configuration

Deploy the cluster in stages to support capacity and troubleshooting. If a network or cloud setup error occurs, you can fix it on its own without having to know which errors are caused by GPU-specific issues:

  1. Build the base cluster first and focus only on standard, non-GPU worker nodes:
  2. After the core environment is functional, add GPU-capable compute pools dynamically via MachineSets:

Create a configuration file for the base cluster with non-GPU worker nodes

Specify the GCP Marketplace image details directly in the install-config.yaml file to set up pay-per-use billing automatically. Doing this removes the need to change machine settings manually when the cluster starts.

Procedure

  • To specify Google Marketplace images in the installconfig, edit the compute.platform.gcp.osImage and controlPlane.platform.gcp.osImage fields to match this sample and save your changes.

Example install-config.yaml file

apiVersion: v1
baseDomain: <your_base_domain>
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      type: n2-standard-16
      osImage: <your_image_name_for_RHAIE>
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    gcp:
      type: n2-standard-16
      osImage: <your_image_name_for_RHAIE> 
  replicas: 3
metadata:
  name: <your_metadata_name>
networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
  machineNetwork:
    - cidr: 10.0.0.0/16
  networkType: OVNKubernetes
  serviceNetwork:
    - 172.30.0.0/16
platform:
  gcp:
    project: <your_gcp_project_id>
    region: <your_gcp_region>
publish: External
pullSecret: '<YOUR_PULL_SECRET_HERE>'
sshKey: |
    <your_ssh_key>

where:

<your_base_domain>
Main domain name you own for routing traffic to the cluster, such as, example.com.
<your_metadata_name>
Unique name you choose for your cluster. For example, rhaie-prod.
<your_gcp_project_id>
Your target Google Cloud project ID entitled to deploy Marketplace instances.
<your_image_name_for_RHAIE>
Name of RHAIE image for installation. Include the the image path, for example, projects/rhcos-cloud/global/images/rhcos-9-6-20251212-1-gcp-x86-64.
<your_gcp_region>
Short code for the GCP region where your servers will be allocated, for example, us-east1.
'YOUR_PULL_SECRET_HERE'
Official authorization text from Red Hat that lets your cluster download software. Use single quotes around the text.
<your_ssh_key>
Specifies the security key pattern that lets you securely log in to the backend cluster nodes to fix problems, for example, ssh-rsa AAAAB3....

Create an OpenShift cluster by using this configuration file

To initialize cloud hardware and launch core cluster services, execute the installation program by using your tailored marketplace configuration template. OpenShift automatically manages your machines using MachineSets. These are configuration files that OpenShift uses to provision and manage virtual machines in GCP with specific hardware.

Always use a fresh or isolated project workspace for each installation attempt to avoid asset tag naming conflicts.

Procedure

  • Create your cluster with the following command:
$ ./openshift-install create cluster --dir <installation_dir>

The installation might take 45 minutes or longer.

Verify that your cluster is stable and fully operational

Before installing GPU-capable compute pools, verify that your cluster is operational with the non-GPU compute nodes that use the GCP Marketplace images. These clusters are billed through GCP Marketplace as resources are consumed.

Procedure

  1. Log in to the cluster:
    • On the OpenShift console, navigate to your login ID, for example, user:admin, and click it.
    • On the dropdown menu, click Copy login command.
    • Click Display token.
    • Log in with the token that is displayed.

Example login

$ oc login --token=sha256~AbCdEf123456XyZ789012 --server=https://api.cluster-xyz.example.com:6443
  1. Verify that the cluster is stable and healthy:
$ oc get clusterversion
$ oc get nodes
$ oc get co

If the cluster is stable, all ClusterOperators report AVAILABLE=True, PROGRESSING=False, and DEGRADED=False.

Troubleshoot cluster problems

If your installation fails or the cluster does not report a functional status, check the following resources:

  • Review the troubleshooting details inside the hidden .openshift_install.log file generated within your installation directory.
  • Verify your Google Cloud project quotas by navigating to IAM & Admin → Quotas inside the GCP Console to ensure adequate vCPU limits exist for your deployment region.

Create a GPU-capable compute pool for your cluster

To support RHOAI accelerated workloads, you must create a GPU-capable compute pool. To add this pool, create a new MachineSet configuration file that targets a GPU-enabled SKU, such as g2-standard-24, which provisions NVIDIA L4 hardware.

Prerequisites

  • You have installed a Red Hat OpenShift base cluster.
  • You have verified that your GCP subscription has enough vCPU and GPU quota for your targeted region zone.

Procedure

  1. Identify an existing MachineSet to use as a template for your new MachineSet:
$ oc get machinesets -n openshift-machine-api
  1. Export an existing MachineSet to a YAML file to use as a template. This ensures your networking and cluster metadata are correct:
$ oc get machineset <existing_machineset_name> -n openshift-machine-api -o yaml > gpu-machineset.yaml

Where <existing_machineset_name> specifies the MachineSet to use as a base template

  1. Modify the YAML by updating the name, the machineType to a GPU SKU, for example,, and the Marketplace image details to ensure billing integration.

Example MachineSet YAML for GCP GPU Nodes
This template uses the g2-standard-24 SKU and includes the required GCP Marketplace image information for Red Hat AI Enterprise billing.

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: <cluster_id> 
  name: <cluster_id>-gpu-worker-<gcp_zone>
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: <cluster_id>
      machine.openshift.io/cluster-api-machineset: <cluster_id>-gpu-worker-<gcp_zone>
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <cluster_id>
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: <cluster_id>-gpu-worker-<gcp_zone>
    spec:
      providerSpec:
        value:
          apiVersion: gcpproviderconfig.openshift.io/v1beta1
          kind: GCPMachineProviderSpec
          zone: <gcp_zone>
          machineType: g2-standard-24
          disks:
          - boot: true
            image: <Your_image_name_for_RHAIE> 


where:
<cluster_id>
Specifies your unique running cluster identifier string.
<gcp_zone>
Specifies your targeted deployment zone within your regional data center where GPU hardware is available. For example, us-east1-a.
<your_image_name_for_RHAIE>
Name of RHAIE image for installation. Include the image path, for example, projects/rhcos-cloud/global/images/rhcos-9-6-20251212-1-gcp-x86-64.

  1. Create the new MachineSet in the OpenShift CLI:
$ oc apply -f gpu-machineset.yaml

Verify that the cluster is running with the GPU compute nodes

Before installing RHOAI, verify that your cluster is working correctly with the GPU compute nodes that use the GCP Marketplace images.

Procedure

  1. Verify that the new machines are provisioning correctly:
$ oc get machines -n openshift-machine-api
  1. Confirm the new node has successfully joined the cluster:
$ oc get nodes

Your GPU-capable machine should be listed as Ready after provisioning finishes. The next step is to install RHOAI and the Operators that support it.

Install RHOAI and its dependencies

When your OpenShift cluster is running and stable, you are ready to prepare the cluster to run Red Hat OpenShift AI (RHOAI).

Prerequisites

  • You have a GPU compute node installed on your cluster.
  • You have the resources for RHOAI components. See the RHOAI release notes for the version that you are installing.

Install required Operators by using the OpenShift console

Before configuring model servers or data science workbenches, you must prepare your cluster by installing foundational Operators from OperatorHub. These utilities enable essential service mesh, serverless, and hardware-detection frameworks. These are included at no additional cost beyond your standard GCP compute fees.

⚠️ IMPORTANT
To resolve all dependencies, you must install the Operators in a specific sequence, and non-GPU Operators must be fully deployed before you install GPU Operators.

Procedure

  1. In the OpenShift web console, navigate to Ecosystem → Software Catalog to add the non-GPU Operators. For each Operator, click Install, use the default installation settings, and click Install again.

    ⚠️ IMPORTANT: To avoid configuration errors, install these exact Operators in the following order:

    1. Red Hat OpenShift Service Mesh 3
    2. Red Hat OpenShift Serverless
    3. cert-manager Operator for Red Hat OpenShift
    4. Red Hat Connectivity Link
    5. Red Hat build of Leader Worker Set
    6. Red Hat build of Kueue
    7. Job Set Operator
  2. After the non-GPU Operators are ready, install the following GPU Operators by navigating to Ecosystem → Software Catalog and clicking Install. Use the default settings, and click Install again:

    • Node Feature Discovery Operator (NFD)
    • NVIDIA GPU Operator

Verify that the Operators are successfully installed

Before you install the primary software suite, ensure that your Operators are installed correctly via the OpenShift Console or the OpenShift CLI.

GUI procedure

  1. In the OpenShift console, go to OperatorHub, and click the Project menu.
  2. Toggle on the Show default projects switch**,** and select All Projects.
  3. Click Ecosystem → Installed Operators.
  4. Check the Operator Status in the table, or search for each Operator by name.
    If the installation was successful, the Operators are displayed in the list of Operators, and their status is Succeeded.

CLI procedure

  • Verify that each Operator has been installed successfully with the following command:
$ oc get csv -A | grep -E 'servicemesh|serverless|cert-manager|connectivity|leader-worker-set|kueue|jobset|nfd|gpu-operator'

Check that all Operators have status Succeeded.

⚠️ IMPORTANT
Do not start installing RHOAI until all Operators show Succeeded. If any Operator remains in a Pending state, check the underlying namespace event logs to verify that cluster quotas have not been exceeded.

Install RHOAI and its components

You have installed the foundational Operators for RHOAI. To deploy core dashboards, interactive workbenches, and data science pipelines in a graphical interface, install the primary Red Hat OpenShift AI (RHOAI) Operator from OperatorHub.

When you install RHOAI, it automatically installs the additional components that it needs to run.

Procedure

  1. In the OpenShift OperatorHub, navigate to Ecosystem → Software Catalog.

  2. Search for OpenShift AI.

  3. If multiple tiles are displayed, find this exact tile Red Hat OpenShift AI Provided by Red Hat, Inc., and click it.
    ![][image1]

  4. In the Channel field, select stable-3.x.

  5. For Version, select 3.4.0 or the latest version.

  6. Keep the default values for Installation mode and Installed Namespace (redhat-ods-operator).

  7. Click Install.

  8. If you have not created the Data Science Cluster already, click Create DataScienceCluster when the button is active. Click Create again.

    The DataScienceCluster Initialization (DSCI) YAML file is created automatically. The DataScienceCluster YAML file is displayed.

  9. Edit the DSC YAML file as needed. For example, if you want to add Llama Stack to your Data Science Cluster, change Removed to Managed and click Save.

Example section of the DSC YAML file

  spec:
    trainer:
      managementState: Managed
    llamastackoperator:
      managementState: Managed
    trainingoperator:
      managementState: Removed

Completing the installation might take a minute or longer depending on your environment.

Verification
When RHOAI and its components are completely installed, RHOAI has the status Succeeded on the OperatorHub and the DataScienceCluster has the status Ready.

  1. To verify the RHOAI status, click Ecosystem → Installed Operators.

    The RHOAI status should be Succeeded.

  2. Click the link for Red Hat OpenShift AI.

    The Provided APIs for RHOAI are displayed as tiles.

  3. To verify that the Data Science Cluster is running, click the DataScienceCluster tab.
    The DataScienceCluster should show the Phase: ![][image2]in the Status column.

  4. To see the details for Data Science Cluster, click the default-dsc link.

Launch RHOAI

You are ready to launch RHOAI. Begin building, training, testing, and deploying both predictive and generative AI models across hybrid cloud environments.

Procedure

  1. From the OpenShift console, click the Applications grid icon. alt text

  2. Under OpenShift Self Managed Services, click Red Hat OpenShift AI, and log in.
    alt text

Article Type