3.4 Early Access (EA1) Release Notes

Red Hat Enterprise Linux AI 3.4

Red Hat Enterprise Linux AI release notes

Red Hat RHEL AI Documentation Team

Abstract

Review new features, enhancements, resolved issues, and known issues associated with this release.

Preface

Red Hat Enterprise Linux AI provides developers and IT organizations with a scalable inference platform for deploying and customizing AI models on secure, scalable resources with minimal configuration and resource usage.

Chapter 1. Version 3.4 release notes

Red Hat Enterprise Linux AI is a generative AI inference platform for Linux environments that uses Red Hat AI Inference Server for running and optimizing models, and includes Red Hat AI Model Optimization Toolkit for model quantization, sparsity, and general compression for supported AI accelerators. Red Hat AI Model Optimization Toolkit has native Hugging Face and vLLM support. You can seamlessly integrate optimized models with deployment pipelines for faster, cost-saving inference at scale, powered by the compressed-tensors model format.

Important

Red Hat Enterprise Linux AI 3.4.0-ea.1 is an Early Access release. Early Access releases are not supported by Red Hat in any way and are not functionally complete or production-ready. Do not use Early Access releases for production or business-critical workloads. Use Early Access releases to test upcoming product features in advance of their possible inclusion in a Red Hat product offering, and to test functionality and provide feedback during the development process. These features might not have any documentation, are subject to change or removal at any time, and testing is limited. Red Hat might provide ways to submit feedback on Early Access features without an associated SLA.

Red Hat Enterprise Linux AI is packaged as a bootc container image for easy deployment on a Linux server appliance with NVIDIA CUDA or AMD ROCm AI accelerators installed. The following container images are available as early access releases from Content from registry.redhat.io is not included.registry.redhat.io:

registry.redhat.io/rhelai-early-access/bootc-cuda-rhel9:3.4.0-ea.1
registry.redhat.io/rhelai-early-access/bootc-rocm-rhel9:3.4.0-ea.1

Important

There is no direct upgrade path from Red Hat Enterprise Linux AI 1.5 to Red Hat Enterprise Linux AI 3.0. You can upgrade from Red Hat Enterprise Linux AI 3.0 to 3.4 and all versions in-between.

Important

The registry.redhat.io/rhelai-early-access/bootc-rocm-rhel9:3.4.0-ea.1 image does not include Red Hat AI Model Optimization Toolkit, which is not supported for AMD ROCm AI accelerators.

Additional resources

Updating RHEL AI

1.1. New features

Red Hat Enterprise Linux AI 3.4 packages Red Hat AI Inference Server 3.4, which includes the following highlights:

Upgraded vLLM to v0.14.1: Red Hat AI Inference Server 3.4 packages the upstream vLLM v0.14.1 release with asynchronous scheduling enabled by default, a new gRPC server entrypoint, auto-context length fitting, and security fixes including token leak prevention in crash logs.
New model support: Red Hat AI Inference Server 3.4 adds support for Grok-2, Mistral 3, MiMo-V2-Flash, Nemotron Parse 1.1, and various other model architectures. LoRA multimodal support has been expanded for LLaVA, BLIP2, PaliGemma, Pixtral, and GLM4-V models. Tool calling enhancements include FunctionGemma and GLM-4.7 parsers.
Performance improvements: Asynchronous scheduling now overlaps engine core scheduling with GPU execution, improving throughput without manual configuration. CUTLASS MoE optimizations deliver up to 5.3% throughput gain and up to 10.8% time to first token improvement. Fused RoPE and MLA KV-cache write optimization improves DeepSeek-style model performance.
New AI accelerator support: Red Hat AI Inference Server 3.4 adds RTX PRO 4500 Blackwell Server Edition GPU support for NVIDIA, AITER RMSNorm fusion for AMD, and chunked prefill and prefix caching for IBM Spyre accelerators. CPU backend adds support for head sizes 80 and 112.
Quantization advances: Marlin support extends to Turing (sm75) architecture. New Quark int4-fp8 w4a8 MoE support, MXFP4 W4A16 support for dense models, and ModelOpt FP8 variants are now available.
Large-scale serving updates: Extended Dual-Batch Overlap (XBO) implementation, NVIDIA Inference Xfer Library (NIXL) asymmetric tensor parallelism, and LMCache KV cache registration improve large-scale serving capabilities.

1.2. Known issues

There are no known issues for Red Hat Enterprise Linux AI 3.4.

Legal Notice

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.