Choose a validated model for reliable serving

Red Hat AI 3

Red Hat AI validated models

Red Hat AI Documentation Team

Abstract

Learn about the validated models that you can inference serve with Red Hat AI.

Preface

Red Hat AI validated and enabled models have been tested and verified to work with Red Hat AI Inference. You can deploy these models for inference serving on supported hardware configurations.

Chapter 1. Red Hat AI validated models

Red Hat AI validated models have been tested and verified to work correctly across supported hardware and product configurations. These models are available as Hugging Face downloads, as OCI artifact images, and as modelcar container images. Platform-specific validated models are also available for IBM Spyre on IBM Power and IBM Z systems.

In addition to validated models, Red Hat ships enabled models as modelcar container images. Enabled models are architecturally supported but have not completed the full validation pipeline. For details about the difference between validated and enabled models, see Model support levels.

Note

If you are using AI Inference with Podman as part of a RHEL AI deployment, use ModelCar container images or Hugging Face models.

If you are using AI Inference as part of an Red Hat OpenShift AI deployment on OpenShift Container Platform, use OCI artifact images.

Red Hat uses Content from github.com is not included.GuideLLM for performance benchmarking and Content from github.com is not included.Language Model Evaluation Harness for accuracy evaluations.

For a complete list of models with platform compatibility data, see Model support matrix.

Important

AMD GPUs support only FP8 and GGUF quantization variant models. For more information, see Content from docs.vllm.ai is not included.Supported hardware.

Chapter 2. Validated model support levels

Red Hat AI ships models at two support levels: validated and enabled. Understanding these support levels helps you make informed decisions about which models to deploy for your inference workloads.

Validated models

Red Hat has tested validated models with Content from github.com is not included.GuideLLM performance benchmarking and Content from github.com is not included.Language Model Evaluation Harness accuracy evaluations across specific OpenShift Container Platform, Red Hat OpenShift AI, and Red Hat AI Inference version combinations.

Validated models are benchmarked for specific use cases. This can include inference performance, quality, and other benchmarks. All third-party models are governed by the third-party license of the original model provider.

Validated models include general-purpose large language models such as Llama, Granite, Mistral, Qwen, and Phi model families, and quantized variants in FP8, INT4, INT8, NVFP4, and BF16 formats.

Enabled models

Red Hat ships enabled models as modelcar container images with architecturally compatible configurations. Enabled models have not completed the full benchmarking and accuracy evaluation pipeline that validated models receive.

Enabled models include specialty categories such as:

  • Embedding models, for example granite-embedding-english-r2, all-MiniLM-L6-v2, nomic-embed-text-v1.5, and Qwen3-Embedding-8B
  • Safety and guard models, for example Llama-Guard-4-12B and granite-guardian-3.2-5b
  • Security models, for example Foundation-Sec-8B-Instruct
  • Reasoning models, for example Phi-4-reasoning
  • Additional general-purpose models not yet through the full validation pipeline

Both support levels indicate that Red Hat ships the model and provides support. The key difference is the depth of testing: validated models have quantified performance and accuracy data for specific platform configurations, while Red Hat verifies that enabled models work with the inference server architecture.

To find the support level for a specific model, see Model support matrix.

Chapter 3. Validated model support matrix

You can use the model support matrix to verify that a model is compatible with your Red Hat AI Inference, Red Hat OpenShift AI, and vLLM version combination before deploying it for inference serving. The matrix lists all validated and enabled models with their minimum platform version requirements and modelcar container image paths.

For more information, see Content from huggingface.co is not included.Red Hat AI models on Hugging Face.

Note

Verify that your deployed Red Hat AI Inference, Red Hat OpenShift AI, and vLLM versions meet or exceed the minimum versions listed for your target model. For an explanation of the Validated and Enabled status values, see Model support levels.

Note

Hugging Face links require internet access. If you are working in a disconnected environment, use the modelcar container image paths with your mirrored registry. For more information, see This content is not included.Deploying the standalone Red Hat AI Inference container in a disconnected environment.

Table 3.1. Red Hat AI Model support matrix

ModelModelcarStatusMin. vLLM versionMin. RHAII versionMin. RHOAI versionMin. vRAM (GB)Supported GPUsMigration guidance

Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct

registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5

Validated

v0.8.4

3

2.21

19 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80

n/a

Content from huggingface.co is not included.RedHatAI/granite-4.0-h-tiny-FP8-dynamic

registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny-fp8-dynamic:3.0

Validated

v0.13.0

3.3.0

3.3.0

9 GB

1XB200, 1XH100, 1XH200, 1XL4

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.1-8B-Instruct

registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct:1.5

Validated

v0.8.4

3

2.21

19 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct

registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct:1.5

Validated

v0.8.4

3

2.21

163 GB

2XH200, 4XA100-80, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-4-Maverick-17B-128E-Instruct

registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct:1.5

Validated

v0.8.4

3

2.21

924 GB

8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Llama-4-Maverick-17B-128E-Instruct-FP8

registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct-fp8:1.5

Validated

v0.8.4

3

2.21

480 GB

4XH200, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct

registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct:1.5

Validated

v0.8.4

3

2.21

250 GB

4XA100-80, 4XH100, 4XH200, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/phi-4

registry.redhat.io/rhelai1/modelcar-phi-4:1.5

Validated

v0.8.4

3

2.21

34 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4

n/a

Content from huggingface.co is not included.RedHatAI/Devstral-Small-2-24B-Instruct-2512

registry.redhat.io/rhai/modelcar-devstral-small-2-24b-instruct-2512:3.0

Validated

v0.14.1

3.4.0-ea.1

3.4.0-ea.1

30 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Ministral-3-3B-Instruct-2512

registry.redhat.io/rhai/modelcar-ministral-3-3b-instruct-2512:3.0

Validated

v0.14.1

3.4.0-ea.1

3.4.0-ea.1

6 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Large-3-675B-Instruct-2512

registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512:3.0

Validated

v0.11.2

3.2.5

3.2

784 GB

8XB200, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Large-3-675B-Instruct-2512-NVFP4

registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512-nvfp4:3.0

Validated

v0.11.2

3.2.5

3.2

464 GB

4XB200, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Small-24B-Instruct-2501

registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501:1.5

Validated

v0.8.4

3

2.21

55 GB

1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503

registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5

Validated

v0.8.4

3

2.21

56 GB

1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Mixtral-8x7B-Instruct-v0.1

registry.redhat.io/rhelai1/modelcar-mixtral-8x7b-instruct-v0-1:1.4

Validated

v0.8.4

3

2.21

108 GB

1XH200, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-quantized-w4a16:1.5

Validated

v0.11.0

3.2.3, 3.2.4

3

8 GB

1XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF

registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf:1.5

Validated

v0.8.4

3

2.21

163 GB

2XH200, 4XA100-80, 4XH100, 8XA100-40

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8

registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-nano-30b-a3b-fp8:3.0

Validated

v0.11.2

3.2.5

3.2

38 GB

1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XB200, 4XH100, 4XH200, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-bf16:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

285 GB

2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-FP8

registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-fp8:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

148 GB

1XH200, 2XH100, 2XH200, 4XH100, 4XH200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-nvfp4:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

93 GB

1XH200, 2XH100, 2XH200

n/a

Content from huggingface.co is not included.RedHatAI/gpt-oss-120b

registry.redhat.io/rhelai1/modelcar-gpt-oss-120b:1.5

Validated

v0.10.1.1

3.2.2

2.25

76 GB

1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/gpt-oss-20b

registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5

Validated

v0.10.1.1

3.2.2

2.25

16 GB

1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct

registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct:1.5

Validated

v0.8.4

3

2.21

18 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100

n/a

Content from huggingface.co is not included.Qwen/Qwen3-8B-FP8

registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8:1.5

Validated

v0.10.0

3.2.1

2.24

11 GB

1XA100-40, 1XH100, 1XL4, 2XH100

n/a

Content from huggingface.co is not included.RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic

registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0

Validated

v0.11.2

3.2.5

3.2

11 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/DeepSeek-R1-0528-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-deepseek-r1-0528-quantized-w4a16:1.5

Validated

v0.10.0

3.2.1

2.24

428 GB

4XB200, 4XH200, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/gemma-3n-E4B-it-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it-fp8-dynamic:1.5

Validated

v0.10.0

3.2.1

2.24

14 GB

1XA100-40, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct-fp8-dynamic

registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

11 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

6 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/granite-4.0-h-small-FP8-dynamic

registry.redhat.io/rhai/modelcar-granite-4-0-h-small-fp8-dynamic:3.0

Validated

v0.13.0

3.3.0

3.3.0

38 GB

1XA100-80, 1XB200, 1XH100, 1XH200

n/a

Content from huggingface.co is not included.RedHatAI/Kimi-K2-Instruct-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-kimi-k2-instruct-quantized-w4a16:1.5

Validated

v0.10.0

3.2.1

2.24

629 GB

4XB200, 8XB200, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

84 GB

1XH200, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

84 GB

1XH200, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XA100-80, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

46 GB

1XH100, 2XH100, 4XA100-40, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-quantized.w8a8

registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w8a8:1.5

Validated

v0.8.4

3

2.21

84 GB

2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

132 GB

2XH100, 2XH200, 4XH100, 8XH100, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

75 GB

2XH100, 2XH200, 4XA100-40, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

11 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/MiniMax-M2.5

registry.redhat.io/rhai/modelcar-minimax-m2-5:3.0

Validated

v0.14.1

3.4.0-ea.1

3.4.0-ea.1

265 GB

2XB200, 4XA100-80, 4XB200, 4XH100, 4XH200

n/a

Content from huggingface.co is not included.RedHatAI/Ministral-3-14B-Instruct-2512

registry.redhat.io/rhai/modelcar-ministral-3-14b-instruct-2512:3.0

Validated

v0.13.0

3.3.0

3.3.0

19 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

30 GB

1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

18 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w8a8

registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5

Validated

v0.8.4

3

2.21

30 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5

Validated

v0.10.1.1

3.2.2

2.25

12 GB

1XA100-40, 1XB200, 1XH100, 1XH200

n/a

Content from huggingface.co is not included.RedHatAI/phi-4-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-phi-4-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

19 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4

n/a

Content from huggingface.co is not included.RedHatAI/Phi-4-mini-instruct-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-phi-4-mini-instruct-fp8-dynamic:1.5

Validated

v0.14.1

3.4.0-ea.1

3.4.0-ea.1

7 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/phi-4-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

11 GB

1XA100-40, 1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100, 2XL4

n/a

Content from huggingface.co is not included.RedHatAI/phi-4-quantized.w8a8

registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w8a8:1.5

Validated

v0.8.4

3

2.21

19 GB

1XA100-40, 1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100

n/a

Content from huggingface.co is not included.RedHatAI/Phi-4-reasoning-FP8-dynamic

registry.redhat.io/rhai/modelcar-phi-4-reasoning-fp8-dynamic:3.0

Validated

v0.13.0

3.3.0

3.3.0

19 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4

n/a

Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-fp8-dynamic:1.5

Validated

v0.8.4

3

2.21

11 GB

1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100

n/a

Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-quantized.w4a16

registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w4a16:1.5

Validated

v0.8.4

3

2.21

7 GB

1XA100-40, 1XA100-80, 1XH100, 1XL4, 2XA100-40, 2XH100, 4XA100-40, 4XH100, 4XL4

n/a

Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-quantized.w8a8

registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w8a8:1.5

Validated

v0.8.4

3

2.21

11 GB

1XA100-40, 1XA100-80, 1XH100, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XH100, 4XL4

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3.5-122B-A10B-FP8-dynamic

registry.redhat.io/rhai/modelcar-qwen3-5-122b-a10b-fp8-dynamic:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

148 GB

2XA100-80, 2XH100, 4XA100-80, 4XH200, 8XA100-80, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3.5-35B-A3B-FP8-dynamic

registry.redhat.io/rhai/modelcar-qwen3-5-35b-a3b-fp8-dynamic:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

44 GB

1XA100-80, 1XH100, 1XH200, 2XA100-80, 2XH100, 2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic

registry.redhat.io/rhai/modelcar-qwen3-5-397b-a17b-fp8-dynamic:3.0

Validated

v0.17.1

3.4.0-ea.2

3.4.0-ea.2

466 GB

4XH200, 8XA100-80, 8XH100

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-8B-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5

Validated

v0.10.0

3.2.1

2.24

11 GB

1XA100-40, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-Coder-480B-A35B-Instruct-FP8

registry.redhat.io/rhelai1/modelcar-qwen3-coder-480b-a35b-instruct-fp8:1.5

Validated

v0.10.1.1

3.2.2

2.25

555 GB

4XB200, 4XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-Coder-Next-NVFP4

registry.redhat.io/rhai/modelcar-qwen3-coder-next-nvfp4:3.0

Validated

v0.14.1

3.4.0-ea.1

3.4.0-ea.1

55 GB

1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XB200, 4XH100, 4XH200, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-Next-80B-A3B-Instruct-FP8

registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-fp8:3.0

Validated

v0.13.0

3.3.0

3.3.0

95 GB

1XB200, 2XA100-80, 2XB200, 2XH200, 4XA100-80, 4XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-Next-80B-A3B-Instruct-quantized.w4a16

registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-quantized-w4a16:3.0

Validated

v0.13.0

3.3.0

3.3.0

51 GB

1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4

registry.redhat.io/rhai/modelcar-qwen3-vl-235b-a22b-instruct-nvfp4:3.0

Validated

v0.13.0

3.3.0

3.3.0

156 GB

1XB200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/sarvam-105b-FP8-Dynamic

registry.redhat.io/rhai/modelcar-sarvam-105b-fp8-dynamic:3.0

Validated

v0.18.0

3.4.0

3.4.0

130 GB

1XH200, 2XH200, 4XH100, 4XH200, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/sarvam-30b-FP8-Dynamic

registry.redhat.io/rhai/modelcar-sarvam-30b-fp8-dynamic:3.0

Validated

v0.18.0

3.4.0

3.4.0

45 GB

1XA100-80, 1XH100, 1XH200, 2XA100-80, 2XH100, 2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200

n/a

Content from huggingface.co is not included.RedHatAI/SmolLM3-3B-FP8-dynamic

registry.redhat.io/rhelai1/modelcar-smollm3-3b-fp8-dynamic:1.5

Validated

v0.10.1.1

3.2.2

2.25

5 GB

1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4

n/a

Content from huggingface.co is not included.RedHatAI/gpt-oss-120b-essential

registry.redhat.io/rhai/modelcar-gpt-oss-120b-essential:3.0

Validated

v0.10.1.1

3.2.2

2.25

76 GB

1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.RedHatAI/gpt-oss-20b-essential

registry.redhat.io/rhai/modelcar-gpt-oss-20b-essential:3.0

Validated

v0.10.1.1

3.2.2

2.25

16 GB

1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4

n/a

Content from huggingface.co is not included.google/gemma-3-1b-it

registry.redhat.io/rhai/modelcar-gemma-3-1b-it:3.0

Enabled

v0.11.2

3.2.5

3.2

3 GB

1XH200

n/a

Content from huggingface.co is not included.google/gemma-3-27b-it

registry.redhat.io/rhai/modelcar-gemma-3-27b-it:3.0

Enabled

v0.11.2

3.2.5

3.2

64 GB

1XH200

n/a

Content from huggingface.co is not included.ibm-granite/granite-guardian-3.2-5b

registry.redhat.io/rhai/modelcar-granite-guardian-3-2-5b:3.0

Enabled

v0.11.2

3.2.5

3.2

14 GB

1XH200

n/a

Content from huggingface.co is not included.meta-llama/Llama-2-7b-chat-hf

registry.redhat.io/rhai/modelcar-llama-2-7b-chat-hf:3.0

Enabled

v0.11.2

3.2.5

3.2

16 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/Foundation-Sec-8B-Instruct

registry.redhat.io/rhai/modelcar-foundation-sec-8b-instruct:3.0

Enabled

v0.13.0

3.3.0

3.3.0

19 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/gemma-3-12b-it

registry.redhat.io/rhai/modelcar-gemma-3-12b-it:3.0

Enabled

v0.11.2

3.2.5

3.2

29 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/gemma-4-26B-A4B-it-FP8-Dynamic

registry.redhat.io/rhai/modelcar-gemma-4-26b-a4b-it-fp8-dynamic:3.0

Enabled

v0.18.0

3.4.0

3.4.0

33 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/gemma-4-31B-it-FP8-Dynamic

registry.redhat.io/rhai/modelcar-gemma-4-31b-it-fp8-dynamic:3.0

Enabled

v0.18.0

3.4.0

3.4.0

39 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/Llama-Guard-4-12B

registry.redhat.io/rhai/modelcar-llama-guard-4-12b:3.0

Enabled

v0.11.2

3.2.5

3.2

28 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/Phi-4-reasoning

registry.redhat.io/rhai/modelcar-phi-4-reasoning:3.0

Enabled

v0.11.2

3.2.5

3.2

34 GB

1XH200

n/a

Content from huggingface.co is not included.RedHatAI/Qwen3-Embedding-8B

registry.redhat.io/rhelai1/modelcar-qwen3-embedding-8b:1.5

Enabled

v0.11.2

3.2.5

3.2

17 GB

1XL4

n/a

Content from huggingface.co is not included.RedHatAI/granite-embedding-english-r2

registry.redhat.io/rhai/modelcar-granite-embedding-english-r2:3.0

Enabled

v0.11.2

3.2.5

3.2

1 GB

1XL4

n/a

Content from huggingface.co is not included.RedHatAI/all-MiniLM-L6-v2

registry.redhat.io/rhai/modelcar-all-minilm-l6-v2:3.0

Enabled

v0.11.2

3.2.5

3.2

1 GB

1XL4

n/a

Content from huggingface.co is not included.RedHatAI/nomic-embed-text-v1.5

registry.redhat.io/rhelai1/modelcar-nomic-embed-text-v1-5:1.5

Enabled

v0.11.2

3.2.5

3.2

1 GB

1XL4

n/a

Chapter 4. Validated OCI artifact model container images

The following table lists validated OCI artifact model container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.

Table 4.1. Validated OCI artifact model container images

ModelQuantized variantsOCI artifact images

llama-4-scout-17b-16e-instruct

INT4, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5
  • FP8: registry.redhat.io/rhelai1/llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5

llama-4-maverick-17b-128e-instruct

FP8

  • Baseline: registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct:1.5
  • FP8: registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8:1.5

mistral-small-3-1-24b-instruct-2503

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503:1.5
  • INT4: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5

llama-3-3-70b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-3-70b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-3-70b-instruct-fp8-dynamic:1.5

llama-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-1-8b-instruct-fp8-dynamic:1.5

granite-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/granite-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/granite-3-1-8b-instruct-fp8-dynamic:1.5

phi-4

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/phi-4:1.5
  • INT4: registry.redhat.io/rhelai1/phi-4-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/phi-4-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/phi-4-fp8-dynamic:1.5

qwen2-5-7b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/qwen2-5-7b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/qwen2-5-7b-instruct-fp8-dynamic:1.5

mistral-small-24b-instruct-2501

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501:1.5
  • INT4: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/mistral-small-24b-instruct-2501-fp8-dynamic:1.5

mixtral-8x7b-instruct-v0-1

None

  • Baseline: registry.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1:1.4

granite-3-1-8b-base

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/granite-3-1-8b-base-quantized-w4a16:1.5

granite-3.1-8b-starter-v2

None

  • Baseline: registry.redhat.io/rhelai1/granite-3.1-8b-starter-v2:1.5

llama-3-1-nemotron-70b-instruct-hf

FP8

  • Baseline: registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf:1.5
  • FP8: registry.redhat.io/rhelai1/llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5

gemma-2-9b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/gemma-2-9b-it:1.5
  • FP8: registry.redhat.io/rhelai1/gemma-2-9b-it-fp8:1.5

deepseek-r1-0528

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/deepseek-r1-0528-quantized-w4a16:1.5

qwen3-8b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/qwen3-8b-fp8-dynamic:1.5

kimi-k2-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/kimi-k2-instruct-quantized-w4a16:1.5

gemma-3n-e4b-it

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/gemma-3n-e4b-it-fp8-dynamic:1.5

gpt-oss-120b

None

  • Baseline: registry.redhat.io/rhelai1/gpt-oss-120b:1.5

gpt-oss-20b

None

  • Baseline: registry.redhat.io/rhelai1/gpt-oss-20b:1.5

qwen3-coder-480b-a35b-instruct

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/qwen3-coder-480b-a35b-instruct-fp8:1.5

whisper-large-v3-turbo

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/whisper-large-v3-turbo-quantized-w4a16:1.5

voxtral-mini-3b-2507

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/voxtral-mini-3b-2507-fp8-dynamic:1.5

nvidia-nemotron-nano-9b-v2

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5

Chapter 5. Validated Red Hat AI ModelCar container images

You can use ModelCar container images to deploy validated models with Red Hat AI Inference. The following table lists the available ModelCar container images and their quantized variants.

Note

For minimum platform version requirements and validation status for each model, see Model support matrix.

Table 5.1. Validated Red Hat AI ModelCar container images

ModelQuantized variantsModelCar images

llama-4-scout-17b-16e-instruct

INT4, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5

llama-4-maverick-17b-128e-instruct

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct-fp8:1.5

mistral-small-3-1-24b-instruct-2503

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5

llama-3-3-70b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-fp8-dynamic:1.5

llama-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5

granite-3-1-8b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-fp8-dynamic:1.5

phi-4

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-phi-4:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-phi-4-fp8-dynamic:1.5

qwen2-5-7b-instruct

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-fp8-dynamic:1.5

mistral-small-24b-instruct-2501

INT4, INT8, FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501:1.5
  • INT4: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w4a16:1.5
  • INT8: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-quantized-w8a8:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501-fp8-dynamic:1.5

mixtral-8x7b-instruct-v0-1

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-mixtral-8x7b-instruct-v0-1:1.4

granite-3-1-8b-base

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-base-quantized-w4a16:1.5

granite-3-1-8b-starter-v2

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-starter-v2:1.5

llama-3-1-nemotron-70b-instruct-hf

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5

gemma-2-9b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-gemma-2-9b-it-fp8:1.5

deepseek-r1-0528

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-deepseek-r1-0528-quantized-w4a16:1.5

qwen3-8b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5

kimi-k2-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-kimi-k2-instruct-quantized-w4a16:1.5

gemma-3n-e4b-it

FP8

  • Baseline: registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it:1.5
  • FP8: registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it-fp8-dynamic:1.5

gpt-oss-120b

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-gpt-oss-120b:1.5

gpt-oss-20b

None

  • Baseline: registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5

qwen3-coder-480b-a35b-instruct

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-qwen3-coder-480b-a35b-instruct-fp8:1.5

whisper-large-v3-turbo

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhelai1/modelcar-whisper-large-v3-turbo-quantized-w4a16:1.5

voxtral-mini-3b-2507

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-voxtral-mini-3b-2507-fp8-dynamic:1.5

nvidia-nemotron-nano-9b-v2

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5

phi-4-reasoning

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-phi-4-reasoning-fp8-dynamic:3.0

qwen3-vl-235b-a22b-instruct-nvfp4

None

  • Baseline: registry.redhat.io/rhai/modelcar-qwen3-vl-235b-a22b-instruct-nvfp4:3.0

qwen3-next-80b-a3b-instruct

INT4 (baseline currently unavailable)

  • INT4: registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-quantized-w4a16:3.0

granite-4-0-h-tiny

FP8

  • Baseline: registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny:3.0
  • FP8: registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny-fp8-dynamic:3.0

granite-4-0-h-small

FP8

  • Baseline: registry.redhat.io/rhai/modelcar-granite-4-0-h-small:3.0
  • FP8: registry.redhat.io/rhai/modelcar-granite-4-0-h-small-fp8-dynamic:3.0

mistral-large-3-675b-instruct-2512

None

  • Baseline: registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512:3.0

mistral-large-3-675b-instruct-2512-nvfp4

None

  • Baseline: registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512-nvfp4:3.0

apertus-8b-instruct-2509

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0

nvidia-nemotron-3-nano-30b-a3b

FP8 (baseline currently unavailable)

  • FP8: registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-nano-30b-a3b-fp8:3.0

ministral-3-14b-instruct-2512

None

  • Baseline: registry.redhat.io/rhai/modelcar-ministral-3-14b-instruct-2512:3.0

Chapter 6. Validated models for x86_64 CPU inference serving

The following large language models have been validated for use with Red Hat AI Inference on x86_64 CPUs.

Table 6.1. Validated models for inferencing with x86_64 CPU

Important

Quantization formats that require GPU-specific kernels, such as Marlin format, are not supported for CPU inference. Use AWQ or GPTQ quantization formats that are compatible with CPU execution.

The following table provides general guidance for approximate system RAM requirements based on model size:

Table 6.2. Memory requirements for inference serving with x86_64 CPU

Model sizeMinimum RAMRecommended RAM

125M - 500M

8 GB

16 GB

500M - 1B

16 GB

32 GB

1B - 3B

32 GB

64 GB

Note

Actual memory usage depends on the model architecture, context length, and batch size. Increase the VLLM_CPU_KVCACHE_SPACE environment variable to allocate more memory for the key-value cache when using longer context lengths.

Chapter 7. Validated models for use with IBM Power and IBM Spyre AI accelerators

The following large language models are supported for IBM Power systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Table 7.2. Reranker models for use with IBM Spyre AI accelerators

ModelHugging Face model card

bge-reranker-v2-m3

Content from huggingface.co is not included.BAAI/bge-reranker-v2-m3

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.

Chapter 8. Validated models for use with IBM Z and IBM Spyre AI accelerators

The following large language models are supported for IBM Z systems with IBM Spyre AI accelerators.

Note

IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.

Important

Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.

Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.

Chapter 9. Validated models for geospatial inference with TerraTorch

The following IBM and NASA Prithvi geospatial foundation models are validated for use with AI Inference and TerraTorch.

Note

Prithvi-EO-2.0 models use the Vision Transformer (ViT) architecture and require TerraTorch as the model implementation backend. These models accept GeoTIFF imagery as input and return segmentation predictions.

Table 9.1. Prithvi geospatial models for use with TerraTorch

ModelUse caseHugging Face model cardValidated on

Prithvi-EO-2.0-300M-TL-Sen1Floods11

Flood detection and mapping

Content from huggingface.co is not included.Prithvi-EO-2.0-300M-TL-Sen1Floods11

RHAIIS 3.3

Prithvi-EO-2.0-300M-BurnScars

Burn scar detection

Content from huggingface.co is not included.Prithvi-EO-2.0-300M-BurnScars

RHAIIS 3.3

Explore the IBM and NASA geospatial models collection on Content from huggingface.co is not included.Hugging Face.

Important

Prithvi geospatial models are validated for use with NVIDIA CUDA AI accelerators only.

These models require specific vLLM server arguments to function correctly. You must include --skip-tokenizer-init, --enforce-eager, and --enable-mm-embeds when starting the inference server.

For more information, see Content from torchgeo.org is not included.Serving TerraTorch Models with vLLM.

Legal Notice

Copyright © Red Hat.
Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.
The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.
All other trademarks are the property of their respective owners.