Choose a validated model for reliable serving
Red Hat AI validated models
Abstract
Preface
Red Hat AI validated and enabled models have been tested and verified to work with Red Hat AI Inference. You can deploy these models for inference serving on supported hardware configurations.
Chapter 1. Red Hat AI validated models
Red Hat AI validated models have been tested and verified to work correctly across supported hardware and product configurations. These models are available as Hugging Face downloads, as OCI artifact images, and as modelcar container images. Platform-specific validated models are also available for IBM Spyre on IBM Power and IBM Z systems.
In addition to validated models, Red Hat ships enabled models as modelcar container images. Enabled models are architecturally supported but have not completed the full validation pipeline. For details about the difference between validated and enabled models, see Model support levels.
If you are using AI Inference with Podman as part of a RHEL AI deployment, use ModelCar container images or Hugging Face models.
If you are using AI Inference as part of an Red Hat OpenShift AI deployment on OpenShift Container Platform, use OCI artifact images.
Red Hat uses Content from github.com is not included.GuideLLM for performance benchmarking and Content from github.com is not included.Language Model Evaluation Harness for accuracy evaluations.
For a complete list of models with platform compatibility data, see Model support matrix.
AMD GPUs support only FP8 and GGUF quantization variant models. For more information, see Content from docs.vllm.ai is not included.Supported hardware.
Chapter 2. Validated model support levels
Red Hat AI ships models at two support levels: validated and enabled. Understanding these support levels helps you make informed decisions about which models to deploy for your inference workloads.
- Validated models
Red Hat has tested validated models with Content from github.com is not included.GuideLLM performance benchmarking and Content from github.com is not included.Language Model Evaluation Harness accuracy evaluations across specific OpenShift Container Platform, Red Hat OpenShift AI, and Red Hat AI Inference version combinations.
Validated models are benchmarked for specific use cases. This can include inference performance, quality, and other benchmarks. All third-party models are governed by the third-party license of the original model provider.
Validated models include general-purpose large language models such as Llama, Granite, Mistral, Qwen, and Phi model families, and quantized variants in FP8, INT4, INT8, NVFP4, and BF16 formats.
- Enabled models
Red Hat ships enabled models as modelcar container images with architecturally compatible configurations. Enabled models have not completed the full benchmarking and accuracy evaluation pipeline that validated models receive.
Enabled models include specialty categories such as:
-
Embedding models, for example
granite-embedding-english-r2,all-MiniLM-L6-v2,nomic-embed-text-v1.5, andQwen3-Embedding-8B -
Safety and guard models, for example
Llama-Guard-4-12Bandgranite-guardian-3.2-5b -
Security models, for example
Foundation-Sec-8B-Instruct -
Reasoning models, for example
Phi-4-reasoning - Additional general-purpose models not yet through the full validation pipeline
-
Embedding models, for example
Both support levels indicate that Red Hat ships the model and provides support. The key difference is the depth of testing: validated models have quantified performance and accuracy data for specific platform configurations, while Red Hat verifies that enabled models work with the inference server architecture.
To find the support level for a specific model, see Model support matrix.
Chapter 3. Validated model support matrix
You can use the model support matrix to verify that a model is compatible with your Red Hat AI Inference, Red Hat OpenShift AI, and vLLM version combination before deploying it for inference serving. The matrix lists all validated and enabled models with their minimum platform version requirements and modelcar container image paths.
For more information, see Content from huggingface.co is not included.Red Hat AI models on Hugging Face.
Verify that your deployed Red Hat AI Inference, Red Hat OpenShift AI, and vLLM versions meet or exceed the minimum versions listed for your target model. For an explanation of the Validated and Enabled status values, see Model support levels.
Hugging Face links require internet access. If you are working in a disconnected environment, use the modelcar container image paths with your mirrored registry. For more information, see This content is not included.Deploying the standalone Red Hat AI Inference container in a disconnected environment.
Table 3.1. Red Hat AI Model support matrix
| Model | Modelcar | Status | Min. vLLM version | Min. RHAII version | Min. RHOAI version | Min. vRAM (GB) | Supported GPUs | Migration guidance |
|---|---|---|---|---|---|---|---|---|
| Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct | registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 19 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80 | n/a |
| Content from huggingface.co is not included.RedHatAI/granite-4.0-h-tiny-FP8-dynamic | registry.redhat.io/rhai/modelcar-granite-4-0-h-tiny-fp8-dynamic:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 9 GB | 1XB200, 1XH100, 1XH200, 1XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.1-8B-Instruct | registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 19 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct | registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 163 GB | 2XH200, 4XA100-80, 4XH100, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-4-Maverick-17B-128E-Instruct | registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 924 GB | 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-4-Maverick-17B-128E-Instruct-FP8 | registry.redhat.io/rhelai1/modelcar-llama-4-maverick-17b-128e-instruct-fp8:1.5 | Validated | v0.8.4 | 3 | 2.21 | 480 GB | 4XH200, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct | registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 250 GB | 4XA100-80, 4XH100, 4XH200, 8XH100 | n/a |
| registry.redhat.io/rhelai1/modelcar-phi-4:1.5 | Validated | v0.8.4 | 3 | 2.21 | 34 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4 | n/a | |
| Content from huggingface.co is not included.RedHatAI/Devstral-Small-2-24B-Instruct-2512 | registry.redhat.io/rhai/modelcar-devstral-small-2-24b-instruct-2512:3.0 | Validated | v0.14.1 | 3.4.0-ea.1 | 3.4.0-ea.1 | 30 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Ministral-3-3B-Instruct-2512 | registry.redhat.io/rhai/modelcar-ministral-3-3b-instruct-2512:3.0 | Validated | v0.14.1 | 3.4.0-ea.1 | 3.4.0-ea.1 | 6 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mistral-Large-3-675B-Instruct-2512 | registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512:3.0 | Validated | v0.11.2 | 3.2.5 | 3.2 | 784 GB | 8XB200, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mistral-Large-3-675B-Instruct-2512-NVFP4 | registry.redhat.io/rhai/modelcar-mistral-large-3-675b-instruct-2512-nvfp4:3.0 | Validated | v0.11.2 | 3.2.5 | 3.2 | 464 GB | 4XB200, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mistral-Small-24B-Instruct-2501 | registry.redhat.io/rhelai1/modelcar-mistral-small-24b-instruct-2501:1.5 | Validated | v0.8.4 | 3 | 2.21 | 55 GB | 1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503 | registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5 | Validated | v0.8.4 | 3 | 2.21 | 56 GB | 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mixtral-8x7B-Instruct-v0.1 | registry.redhat.io/rhelai1/modelcar-mixtral-8x7b-instruct-v0-1:1.4 | Validated | v0.8.4 | 3 | 2.21 | 108 GB | 1XH200, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-quantized-w4a16:1.5 | Validated | v0.11.0 | 3.2.3, 3.2.4 | 3 | 8 GB | 1XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF | registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf:1.5 | Validated | v0.8.4 | 3 | 2.21 | 163 GB | 2XH200, 4XA100-80, 4XH100, 8XA100-40 | n/a |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 | registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-nano-30b-a3b-fp8:3.0 | Validated | v0.11.2 | 3.2.5 | 3.2 | 38 GB | 1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XB200, 4XH100, 4XH200, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 | registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-bf16:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 285 GB | 2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-FP8 | registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-fp8:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 148 GB | 1XH200, 2XH100, 2XH200, 4XH100, 4XH200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 | registry.redhat.io/rhai/modelcar-nvidia-nemotron-3-super-120b-a12b-nvfp4:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 93 GB | 1XH200, 2XH100, 2XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/gpt-oss-120b | registry.redhat.io/rhelai1/modelcar-gpt-oss-120b:1.5 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 76 GB | 1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/gpt-oss-20b | registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 16 GB | 1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct | registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct:1.5 | Validated | v0.8.4 | 3 | 2.21 | 18 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100 | n/a |
| Content from huggingface.co is not included.Qwen/Qwen3-8B-FP8 | registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8:1.5 | Validated | v0.10.0 | 3.2.1 | 2.24 | 11 GB | 1XA100-40, 1XH100, 1XL4, 2XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic | registry.redhat.io/rhai/modelcar-apertus-8b-instruct-2509-fp8-dynamic:3.0 | Validated | v0.11.2 | 3.2.5 | 3.2 | 11 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/DeepSeek-R1-0528-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-deepseek-r1-0528-quantized-w4a16:1.5 | Validated | v0.10.0 | 3.2.1 | 2.24 | 428 GB | 4XB200, 4XH200, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/gemma-3n-E4B-it-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-gemma-3n-e4b-it-fp8-dynamic:1.5 | Validated | v0.10.0 | 3.2.1 | 2.24 | 14 GB | 1XA100-40, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct-fp8-dynamic | registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 11 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/granite-3.1-8b-instruct-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 6 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/granite-4.0-h-small-FP8-dynamic | registry.redhat.io/rhai/modelcar-granite-4-0-h-small-fp8-dynamic:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 38 GB | 1XA100-80, 1XB200, 1XH100, 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Kimi-K2-Instruct-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-kimi-k2-instruct-quantized-w4a16:1.5 | Validated | v0.10.0 | 3.2.1 | 2.24 | 629 GB | 4XB200, 8XB200, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.1-Nemotron-70B-Instruct-HF-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-llama-3-1-nemotron-70b-instruct-hf-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 84 GB | 1XH200, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 84 GB | 1XH200, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XA100-80, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 46 GB | 1XH100, 2XH100, 4XA100-40, 4XH100, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-3.3-70B-Instruct-quantized.w8a8 | registry.redhat.io/rhelai1/modelcar-llama-3-3-70b-instruct-quantized-w8a8:1.5 | Validated | v0.8.4 | 3 | 2.21 | 84 GB | 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 132 GB | 2XH100, 2XH200, 4XH100, 8XH100, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-llama-4-scout-17b-16e-instruct-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 75 GB | 2XH100, 2XH200, 4XA100-40, 4XH100, 8XA100-40, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-llama-3-1-8b-instruct-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 11 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/MiniMax-M2.5 | registry.redhat.io/rhai/modelcar-minimax-m2-5:3.0 | Validated | v0.14.1 | 3.4.0-ea.1 | 3.4.0-ea.1 | 265 GB | 2XB200, 4XA100-80, 4XB200, 4XH100, 4XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Ministral-3-14B-Instruct-2512 | registry.redhat.io/rhai/modelcar-ministral-3-14b-instruct-2512:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 19 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 30 GB | 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XH100 | n/a |
| registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 18 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 4XA100-40, 4XA100-80, 4XH100, 8XA100-40, 8XH100 | n/a | |
| registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503-quantized-w8a8:1.5 | Validated | v0.8.4 | 3 | 2.21 | 30 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100, 4XL4, 8XA100-40, 8XA100-80, 8XH100 | n/a | |
| Content from huggingface.co is not included.RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-nvidia-nemotron-nano-9b-v2-fp8-dynamic:1.5 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 12 GB | 1XA100-40, 1XB200, 1XH100, 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/phi-4-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-phi-4-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 19 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 2XA100-40, 2XA100-80, 2XH100, 2XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Phi-4-mini-instruct-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-phi-4-mini-instruct-fp8-dynamic:1.5 | Validated | v0.14.1 | 3.4.0-ea.1 | 3.4.0-ea.1 | 7 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-80, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-80, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/phi-4-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 11 GB | 1XA100-40, 1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100, 2XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/phi-4-quantized.w8a8 | registry.redhat.io/rhelai1/modelcar-phi-4-quantized-w8a8:1.5 | Validated | v0.8.4 | 3 | 2.21 | 19 GB | 1XA100-40, 1XA100-80, 1XH100, 2XA100-40, 2XA100-80, 2XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Phi-4-reasoning-FP8-dynamic | registry.redhat.io/rhai/modelcar-phi-4-reasoning-fp8-dynamic:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 19 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XB200, 2XH100, 2XH200, 2XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-fp8-dynamic:1.5 | Validated | v0.8.4 | 3 | 2.21 | 11 GB | 1XA100-40, 1XA100-80, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XA100-80, 4XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-quantized.w4a16 | registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w4a16:1.5 | Validated | v0.8.4 | 3 | 2.21 | 7 GB | 1XA100-40, 1XA100-80, 1XH100, 1XL4, 2XA100-40, 2XH100, 4XA100-40, 4XH100, 4XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen2.5-7B-Instruct-quantized.w8a8 | registry.redhat.io/rhelai1/modelcar-qwen2-5-7b-instruct-quantized-w8a8:1.5 | Validated | v0.8.4 | 3 | 2.21 | 11 GB | 1XA100-40, 1XA100-80, 1XH100, 1XL4, 2XA100-40, 2XA100-80, 2XH100, 2XL4, 4XA100-40, 4XH100, 4XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3.5-122B-A10B-FP8-dynamic | registry.redhat.io/rhai/modelcar-qwen3-5-122b-a10b-fp8-dynamic:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 148 GB | 2XA100-80, 2XH100, 4XA100-80, 4XH200, 8XA100-80, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3.5-35B-A3B-FP8-dynamic | registry.redhat.io/rhai/modelcar-qwen3-5-35b-a3b-fp8-dynamic:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 44 GB | 1XA100-80, 1XH100, 1XH200, 2XA100-80, 2XH100, 2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic | registry.redhat.io/rhai/modelcar-qwen3-5-397b-a17b-fp8-dynamic:3.0 | Validated | v0.17.1 | 3.4.0-ea.2 | 3.4.0-ea.2 | 466 GB | 4XH200, 8XA100-80, 8XH100 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-8B-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-qwen3-8b-fp8-dynamic:1.5 | Validated | v0.10.0 | 3.2.1 | 2.24 | 11 GB | 1XA100-40, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-Coder-480B-A35B-Instruct-FP8 | registry.redhat.io/rhelai1/modelcar-qwen3-coder-480b-a35b-instruct-fp8:1.5 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 555 GB | 4XB200, 4XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-Coder-Next-NVFP4 | registry.redhat.io/rhai/modelcar-qwen3-coder-next-nvfp4:3.0 | Validated | v0.14.1 | 3.4.0-ea.1 | 3.4.0-ea.1 | 55 GB | 1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XB200, 4XH100, 4XH200, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-Next-80B-A3B-Instruct-FP8 | registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-fp8:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 95 GB | 1XB200, 2XA100-80, 2XB200, 2XH200, 4XA100-80, 4XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-Next-80B-A3B-Instruct-quantized.w4a16 | registry.redhat.io/rhai/modelcar-qwen3-next-80b-a3b-instruct-quantized-w4a16:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 51 GB | 1XA100-80, 1XB200, 1XH100, 1XH200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-VL-235B-A22B-Instruct-NVFP4 | registry.redhat.io/rhai/modelcar-qwen3-vl-235b-a22b-instruct-nvfp4:3.0 | Validated | v0.13.0 | 3.3.0 | 3.3.0 | 156 GB | 1XB200, 2XA100-80, 2XB200, 2XH100, 2XH200, 4XA100-80, 4XB200, 4XH100, 4XH200, 8XA100-80, 8XB200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/sarvam-105b-FP8-Dynamic | registry.redhat.io/rhai/modelcar-sarvam-105b-fp8-dynamic:3.0 | Validated | v0.18.0 | 3.4.0 | 3.4.0 | 130 GB | 1XH200, 2XH200, 4XH100, 4XH200, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/sarvam-30b-FP8-Dynamic | registry.redhat.io/rhai/modelcar-sarvam-30b-fp8-dynamic:3.0 | Validated | v0.18.0 | 3.4.0 | 3.4.0 | 45 GB | 1XA100-80, 1XH100, 1XH200, 2XA100-80, 2XH100, 2XH200, 4XA100-80, 4XH100, 4XH200, 8XA100-80, 8XH100, 8XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/SmolLM3-3B-FP8-dynamic | registry.redhat.io/rhelai1/modelcar-smollm3-3b-fp8-dynamic:1.5 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 5 GB | 1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/gpt-oss-120b-essential | registry.redhat.io/rhai/modelcar-gpt-oss-120b-essential:3.0 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 76 GB | 1XB200, 1XH100, 1XH200, 2XB200, 2XH100, 2XH200, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/gpt-oss-20b-essential | registry.redhat.io/rhai/modelcar-gpt-oss-20b-essential:3.0 | Validated | v0.10.1.1 | 3.2.2 | 2.25 | 16 GB | 1XA100-40, 1XB200, 1XH100, 1XH200, 1XL4, 2XA100-40, 2XB200, 2XH100, 2XH200, 2XL4, 4XA100-40, 4XB200, 4XH100, 4XH200, 4XL4, 8XA100-40, 8XB200, 8XH100, 8XH200, 8XL4 | n/a |
| Content from huggingface.co is not included.google/gemma-3-1b-it | registry.redhat.io/rhai/modelcar-gemma-3-1b-it:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 3 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.google/gemma-3-27b-it | registry.redhat.io/rhai/modelcar-gemma-3-27b-it:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 64 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.ibm-granite/granite-guardian-3.2-5b | registry.redhat.io/rhai/modelcar-granite-guardian-3-2-5b:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 14 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.meta-llama/Llama-2-7b-chat-hf | registry.redhat.io/rhai/modelcar-llama-2-7b-chat-hf:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 16 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Foundation-Sec-8B-Instruct | registry.redhat.io/rhai/modelcar-foundation-sec-8b-instruct:3.0 | Enabled | v0.13.0 | 3.3.0 | 3.3.0 | 19 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/gemma-3-12b-it | registry.redhat.io/rhai/modelcar-gemma-3-12b-it:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 29 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/gemma-4-26B-A4B-it-FP8-Dynamic | registry.redhat.io/rhai/modelcar-gemma-4-26b-a4b-it-fp8-dynamic:3.0 | Enabled | v0.18.0 | 3.4.0 | 3.4.0 | 33 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/gemma-4-31B-it-FP8-Dynamic | registry.redhat.io/rhai/modelcar-gemma-4-31b-it-fp8-dynamic:3.0 | Enabled | v0.18.0 | 3.4.0 | 3.4.0 | 39 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Llama-Guard-4-12B | registry.redhat.io/rhai/modelcar-llama-guard-4-12b:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 28 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Phi-4-reasoning | registry.redhat.io/rhai/modelcar-phi-4-reasoning:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 34 GB | 1XH200 | n/a |
| Content from huggingface.co is not included.RedHatAI/Qwen3-Embedding-8B | registry.redhat.io/rhelai1/modelcar-qwen3-embedding-8b:1.5 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 17 GB | 1XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/granite-embedding-english-r2 | registry.redhat.io/rhai/modelcar-granite-embedding-english-r2:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 1 GB | 1XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/all-MiniLM-L6-v2 | registry.redhat.io/rhai/modelcar-all-minilm-l6-v2:3.0 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 1 GB | 1XL4 | n/a |
| Content from huggingface.co is not included.RedHatAI/nomic-embed-text-v1.5 | registry.redhat.io/rhelai1/modelcar-nomic-embed-text-v1-5:1.5 | Enabled | v0.11.2 | 3.2.5 | 3.2 | 1 GB | 1XL4 | n/a |
Chapter 4. Validated OCI artifact model container images
The following table lists validated OCI artifact model container images available from the Red Hat container registry, including baseline and quantized variants for each supported model.
Table 4.1. Validated OCI artifact model container images
| Model | Quantized variants | OCI artifact images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3.1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 (baseline currently unavailable) |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|
Chapter 5. Validated Red Hat AI ModelCar container images
You can use ModelCar container images to deploy validated models with Red Hat AI Inference. The following table lists the available ModelCar container images and their quantized variants.
For minimum platform version requirements and validation status for each model, see Model support matrix.
Table 5.1. Validated Red Hat AI ModelCar container images
| Model | Quantized variants | ModelCar images |
|---|---|---|
| llama-4-scout-17b-16e-instruct | INT4, FP8 |
|
| llama-4-maverick-17b-128e-instruct | FP8 |
|
| mistral-small-3-1-24b-instruct-2503 | INT4, INT8, FP8 |
|
| llama-3-3-70b-instruct | INT4, INT8, FP8 |
|
| llama-3-1-8b-instruct | INT4, INT8, FP8 |
|
| granite-3-1-8b-instruct | INT4, INT8, FP8 |
|
| phi-4 | INT4, INT8, FP8 |
|
| qwen2-5-7b-instruct | INT4, INT8, FP8 |
|
| mistral-small-24b-instruct-2501 | INT4, INT8, FP8 |
|
| mixtral-8x7b-instruct-v0-1 | None |
|
| granite-3-1-8b-base | INT4 (baseline currently unavailable) |
|
| granite-3-1-8b-starter-v2 | None |
|
| llama-3-1-nemotron-70b-instruct-hf | FP8 |
|
| gemma-2-9b-it | FP8 |
|
| deepseek-r1-0528 | INT4 (baseline currently unavailable) |
|
| qwen3-8b | FP8 (baseline currently unavailable) |
|
| kimi-k2-instruct | INT4 (baseline currently unavailable) |
|
| gemma-3n-e4b-it | FP8 |
|
| gpt-oss-120b | None |
|
| gpt-oss-20b | None |
|
| qwen3-coder-480b-a35b-instruct | FP8 (baseline currently unavailable) |
|
| whisper-large-v3-turbo | INT4 (baseline currently unavailable) |
|
| voxtral-mini-3b-2507 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-nano-9b-v2 | FP8 (baseline currently unavailable) |
|
| phi-4-reasoning | FP8 (baseline currently unavailable) |
|
| qwen3-vl-235b-a22b-instruct-nvfp4 | None |
|
| qwen3-next-80b-a3b-instruct | INT4 (baseline currently unavailable) |
|
| granite-4-0-h-tiny | FP8 |
|
| granite-4-0-h-small | FP8 |
|
| mistral-large-3-675b-instruct-2512 | None |
|
| mistral-large-3-675b-instruct-2512-nvfp4 | None |
|
| apertus-8b-instruct-2509 | FP8 (baseline currently unavailable) |
|
| nvidia-nemotron-3-nano-30b-a3b | FP8 (baseline currently unavailable) |
|
| ministral-3-14b-instruct-2512 | None |
|
Chapter 6. Validated models for x86_64 CPU inference serving
The following large language models have been validated for use with Red Hat AI Inference on x86_64 CPUs.
Table 6.1. Validated models for inferencing with x86_64 CPU
| Model | Hugging Face model card | Number of parameters |
|---|---|---|
| TinyLlama-1.1B-Chat-v1.0 | Content from huggingface.co is not included.TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 1.1B |
| Llama-3.2-1B-Instruct | Content from huggingface.co is not included.meta-llama/Llama-3.2-1B-Instruct | 1B |
| granite-3.2-2b-instruct | Content from huggingface.co is not included.ibm-granite/granite-3.2-2b-instruct | 2B |
| TinyLlama-1.1B-Chat-v1.0-pruned2.4 | Content from huggingface.co is not included.RedHatAI/TinyLlama-1.1B-Chat-v1.0-pruned2.4 | 1.1B (pruned) |
Quantization formats that require GPU-specific kernels, such as Marlin format, are not supported for CPU inference. Use AWQ or GPTQ quantization formats that are compatible with CPU execution.
The following table provides general guidance for approximate system RAM requirements based on model size:
Table 6.2. Memory requirements for inference serving with x86_64 CPU
| Model size | Minimum RAM | Recommended RAM |
|---|---|---|
| 125M - 500M | 8 GB | 16 GB |
| 500M - 1B | 16 GB | 32 GB |
| 1B - 3B | 32 GB | 64 GB |
Actual memory usage depends on the model architecture, context length, and batch size. Increase the VLLM_CPU_KVCACHE_SPACE environment variable to allocate more memory for the key-value cache when using longer context lengths.
Chapter 7. Validated models for use with IBM Power and IBM Spyre AI accelerators
The following large language models are supported for IBM Power systems with IBM Spyre AI accelerators.
IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.
Table 7.1. IBM Granite models for use with IBM Spyre AI accelerators
| Model | Hugging Face model card |
|---|---|
| granite-3.3-8b-instruct | Content from huggingface.co is not included.ibm-granite/granite-3.3-8b-instruct |
| granite-embedding-30m-english | Content from huggingface.co is not included.ibm-granite/granite-embedding-30m-english |
| granite-embedding-107m-multilingual | Content from huggingface.co is not included.ibm-granite/granite-embedding-107m-multilingual |
| granite-embedding-125m-english | Content from huggingface.co is not included.ibm-granite/granite-embedding-125m-english |
| granite-embedding-278m-multilingual | Content from huggingface.co is not included.ibm-granite/granite-embedding-278m-multilingual |
Table 7.2. Reranker models for use with IBM Spyre AI accelerators
| Model | Hugging Face model card |
|---|---|
| bge-reranker-v2-m3 | Content from huggingface.co is not included.BAAI/bge-reranker-v2-m3 |
Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.
Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.
Chapter 8. Validated models for use with IBM Z and IBM Spyre AI accelerators
The following large language models are supported for IBM Z systems with IBM Spyre AI accelerators.
IBM Spyre AI accelerator cards support FP16 format model weights only. For compatible models, the Red Hat AI Inference inference engine automatically converts weights to FP16 at startup. No additional configuration is needed.
Table 8.1. Decoder models for use with IBM Spyre AI accelerators
| Model | Hugging Face model card |
|---|---|
| granite-3.3-8b-instruct | Content from huggingface.co is not included.ibm-granite/granite-3.3-8b-instruct |
| granite-3.3-8b-instruct-FP8 | Content from huggingface.co is not included.ibm-granite/granite-3.3-8b-instruct-FP8 |
| granite-4.1-8b | Content from huggingface.co is not included.ibm-granite/granite-4.1-8b |
| granite-4.1-8b-fp8 | Content from huggingface.co is not included.ibm-granite/granite-4.1-8b-fp8 |
| Ministral-3-14B-Instruct-2512-BF16 | Content from huggingface.co is not included.mistralai/Ministral-3-14B-Instruct-2512-BF16 |
Pre-built IBM Granite models run with the specific Python packages that are included in the Red Hat AI Inference Spyre container image. The models are tied to fixed configurations for Spyre card count, batch size, and input/output context sizes.
Updating or replacing Python packages in the Red Hat AI Inference Spyre container image is not supported.
Chapter 9. Validated models for geospatial inference with TerraTorch
The following IBM and NASA Prithvi geospatial foundation models are validated for use with AI Inference and TerraTorch.
Prithvi-EO-2.0 models use the Vision Transformer (ViT) architecture and require TerraTorch as the model implementation backend. These models accept GeoTIFF imagery as input and return segmentation predictions.
Table 9.1. Prithvi geospatial models for use with TerraTorch
| Model | Use case | Hugging Face model card | Validated on |
|---|---|---|---|
| Prithvi-EO-2.0-300M-TL-Sen1Floods11 | Flood detection and mapping | Content from huggingface.co is not included.Prithvi-EO-2.0-300M-TL-Sen1Floods11 | RHAIIS 3.3 |
| Prithvi-EO-2.0-300M-BurnScars | Burn scar detection | Content from huggingface.co is not included.Prithvi-EO-2.0-300M-BurnScars | RHAIIS 3.3 |
Explore the IBM and NASA geospatial models collection on Content from huggingface.co is not included.Hugging Face.
Prithvi geospatial models are validated for use with NVIDIA CUDA AI accelerators only.
These models require specific vLLM server arguments to function correctly. You must include --skip-tokenizer-init, --enforce-eager, and --enable-mm-embeds when starting the inference server.
For more information, see Content from torchgeo.org is not included.Serving TerraTorch Models with vLLM.