Page cover image

NVIDIA GPUDirect

NVIDIA GPUDirect is a family of technologies that enables direct communication and data transfer between GPUs and other devices like network adapters, storage drives, and video I/O devices.

It is designed to reduce latency, increase bandwidth, and decrease CPU overhead in high-performance computing (HPC), data analytics, and AI workloads.

GPUDirect History and Architecture

GPUDirect was first introduced in 2010 with GPUDirect 1.0, allowing direct memory access (DMA) transfers between GPUs within the same system.

As it evolved, it introduced support for peer-to-peer (P2P) communication between GPUs across PCIe.

It then moved to enable direct data transfer between GPUs and third-party devices like network adapters, eliminating the need for intermediate copies in system memory.

The most recent iteration extended direct data transfer capabilities to storage devices, enabling a direct path between local/remote storage and GPU memory.

How GPUDirect Works

GPUDirect technologies leverage DMA (Direct Memory Access) engines in devices like NICs, storage controllers, and GPUs to move data directly to/from GPU memory.

NICs, or Network Interface Cards, are hardware components used to connect a computer or other device to a network. They enable devices to communicate over a network by providing a physical interface for transmitting and receiving data.

In the context of GPUDirect, NICs with DMA (Direct Memory Access) capabilities allow for efficient data transfers directly to and from GPU memory, bypassing the CPU and reducing latency and CPU overhead.

This capability is particularly important in high-performance computing environments where maximising data transfer speeds and minimising latency are critical.

GPUDirect exposes GPU memory addresses to the PCI Express (PCIe) address space, allowing devices to access GPU memory directly without involving the CPU.

By eliminating intermediate data copies and reducing CPU involvement, GPUDirect reduces latency, increases bandwidth, and frees up CPU resources.

GPUDirect Storage enables a direct data path between local or remote storage, such as NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. It avoids extra copies through a bounce buffer in the CPU’s memory, enabling a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory — all without burdening the CPU

Integration with NVIDIA Quantum InfiniBand

NVIDIA Quantum InfiniBand is a high-performance, low-latency interconnect designed for AI and HPC workloads.

GPUDirect RDMA is a key technology that enables efficient data transfer between GPUs across InfiniBand networks.

With GPUDirect RDMA, data can be directly transferred between GPU memory of different nodes without involving the CPU or system memory.

This direct data path significantly reduces latency and increases bandwidth, enabling scalable multi-GPU and multi-node performance.

Direct Communication between NVIDIA GPUs

Integration with Other NVIDIA Systems

GPUDirect technologies work with NVIDIA's accelerated computing platforms, including DGX systems and HGX servers.

GPUDirect Storage enables fast data transfer between storage devices (local NVMe or remote storage over NVMe-oF) and GPU memory in these systems.

It leverages the high-bandwidth, low-latency PCIe topology in NVIDIA's systems to optimise data paths and maximise performance.

GPUDirect technologies also integrate with NVIDIA's software stack, including , , and , enabling developers to take advantage of direct data paths in their applications.

Summary

In summary, NVIDIA GPUDirect is a suite of technologies that optimise data movement and access for GPUs, reducing latency, increasing bandwidth, and offloading CPU overhead.

It is a critical component in NVIDIA's accelerated computing stack, enabling high-performance AI, HPC, and data analytics workloads.

GPUDirect RDMA, in particular, works closely with NVIDIA Quantum InfiniBand to provide fast, direct data transfer between GPUs across network nodes, enabling scalable multi-GPU and multi-node performance.

As GPU computing power continues to grow, GPUDirect technologies play an increasingly important role in relieving I/O bottlenecks and enabling efficient data movement in GPU-accelerated systems.

Last updated

Logo

Continuum - Accelerated Artificial Intelligence

Continuum WebsiteAxolotl Platform

Copyright Continuum Labs - 2023