Obtaining Support from NVIDIA

Solution Verified - Updated

Environment

  • NVIDIA DGX Servers (DGX A100, DGX H100, DGX H200, DGX B200)
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
  • NVIDIA MGX (NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip)
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
  • NVIDIA Data Center or NVIDIA RTX GPUs
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat OpenShift Virtualization
    • Red Hat OpenStack Platform
    • Red Hat Virtualization (RHV) with vGPU/GRID technology.
  • NVIDIA A100X Converged Card

Issue

NVIDIA has several product lines that interact with several different Red Hat products and platforms. They also have specific NVIDIA support entitlement needs.

OpenShift PlatformGPU ConfigurationNeed Subscription for NVAIE support to download the NVIDIA SoftwareNeed Subscription for NVAIE or vGPU SKUs to be supported
OpenShift on bare metalPhysical GPUNoYes
OpenShift on bare metalMIGNoYes
OpenShift VirtualizationPassthroughNoYes
OpenShift VirtualizationvGPUYesYes
OpenShift on VMWare vSphere Virtual MachinesPassthroughNoYes
OpenShift on Red Hat OpenStack Platform or RHV Virtual MachinePassthroughNoYes
OpenShift on Red Hat OpenStack Platform or RHV Virtual MachinevGPUYesYes
OpenShift on AWS (ROSA or self-managed)-NoYes
OpenShift on Azure (ARO or self-managed)-NoYes
OpenShift on GCP (self-managed)-NoYes
OpenShift on OCI (VM or bare-metal shapes)-NoTech Preview
Red Hat Device Edge-NoTech Preview

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

OEM Hardware Support
For support on OEM-provided hardware platforms, please be sure to contact your OEM hardware and service provider.

NVIDIA Enterprise Support
Please be advised that an active NVIDIA Enterprise Support contract is required.

In order to receive Enterprise Support from NVIDIA on the NVIDIA GPU Operator, you need either an NVIDIA vGPU or NVIDIA AI Enterprise Support entitlement.

NVIDIA Network Operator customers must purchase NVIDIA AI Enterprise support Support from NVIDIA. A list of devices supported by the NVIDIA GPU Operator can be found here:

Content from docs.nvidia.com is not included.Content from docs.nvidia.com is not included.https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/platform-support.html

OpenShift with NVIDIA GPU configuration documentation is here:
Content from docs.nvidia.com is not included.Content from docs.nvidia.com is not included.https://docs.nvidia.com/datacenter/cloud-native/openshift/introduction.html

If this is of interest or you have additional questions, please contact Content from www.nvidia.com is not included.NVIDIA Enterprise Support.

To get support for NVIDIA GPUs, the customer should use the NVIDIA GPU on a:

No NVIDIA licenses are required to install and use the NVIDIA GPU Operator on AWS instances, bare-metal servers, or OpenShift on vMware vSphere with NVIDIA GPUs in Passthrough mode.

NVIDIA GPU Operator Components
NVIDIA GPU Operator Components

Support Flow
In case of an incident, the customer opens a ticket to Red Hat support:

  • If the root cause is on the OpenShift perimeter, Red Hat investigates and fixes the problem.
  • If the root cause is not OpenShift, the customer must contact NVIDIA support, open a case and share the NVIDIA case ID to Red Hat. NVIDIA and Red Hat can initiate the collaboration with TSANet.

FAQ
1. As an OpenShift customer, how can I get NVAIE license?
The following links are ways to obtain to NVAIE license:

For OpenShift deployments on VMware vSphere, the NVIDIA GPU Operator allows you to utilize PCIe Passthrough, enabling GPU resources for your workloads. However, it's essential to be aware that this configuration falls under the category of "Community supported." This means that while it is technically feasible, it may not receive the same level of support and troubleshooting capabilities as a licensed solution.

While you can employ PCIe Passthrough, it is highly recommended to obtain this license for multiple reasons, including NVIDIA GPU operator support, NVIDIA CUDA driver build failures, assistance with NVIDIA device plugin bugs, DCGM exporter problems, etc.

In order to receive NVIDIA Enterprise Support, NVAIE (NVIDIA AI Enterprise) or vGPU entitlement must be in place and purchased.

For Red Hat customers using OpenShift on VMware vSphere, it's important to understand that Red Hat's support capabilities are closely tied to NVIDIA's NVAIE licensing. Red Hat can engage with NVIDIA support on behalf of a customer only when the customer has purchased an NVAIE license.

In summary, while PCIe Passthrough can be used without an NVAIE license in OpenShift on VMware vSphere, the decision to purchase an NVAIE license should be based on the level of support and assistance required. To ensure access to NVIDIA's support resources and to enable Red Hat's support team to collaborate effectively with NVIDIA, customers are encouraged to order an NVAIE license when working with NVIDIA GPU operator and related technologies. This approach ensures a smoother and more efficient resolution of GPU-related issues and ensures the best possible support for your OpenShift deployment.

3. As an OpenShift customer, NVIDIA A100X Driver Support
The Content from www.nvidia.com is not included.NVIDIA A100X is a converged accelerator card consisting of a Bluefield 2 DPU and an A100 GPU. When the A100X is deployed as two discrete cards (GPU, NIC) then the RHEL supported driver, MLX5, is used for the NIC and NVIDIA provides the GPU drivers. Divers are available Content from network.nvidia.com is not included.here.

Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.