NVIDIA GPUDirect
Last updated
Copyright Continuum Labs - 2023
Last updated
NVIDIA GPUDirect is a family of technologies that enables direct communication and data transfer between GPUs and other devices like network adapters, storage drives, and video I/O devices.
It is designed to reduce latency, increase bandwidth, and decrease CPU overhead in high-performance computing (HPC), data analytics, and AI workloads.
GPUDirect was first introduced in 2010 with GPUDirect 1.0, allowing direct memory access (DMA) transfers between GPUs within the same system.
As it evolved, it introduced support for peer-to-peer (P2P) communication between GPUs across PCIe.
It then moved to enable direct data transfer between GPUs and third-party devices like network adapters, eliminating the need for intermediate copies in system memory.
The most recent iteration extended direct data transfer capabilities to storage devices, enabling a direct path between local/remote storage and GPU memory.
GPUDirect technologies leverage DMA (Direct Memory Access) engines in devices like NICs, storage controllers, and GPUs to move data directly to/from GPU memory.
NICs, or Network Interface Cards, are hardware components used to connect a computer or other device to a network. They enable devices to communicate over a network by providing a physical interface for transmitting and receiving data.
In the context of GPUDirect, NICs with DMA (Direct Memory Access) capabilities allow for efficient data transfers directly to and from GPU memory, bypassing the CPU and reducing latency and CPU overhead.
This capability is particularly important in high-performance computing environments where maximising data transfer speeds and minimising latency are critical.
GPUDirect exposes GPU memory addresses to the PCI Express (PCIe) address space, allowing devices to access GPU memory directly without involving the CPU.
By eliminating intermediate data copies and reducing CPU involvement, GPUDirect reduces latency, increases bandwidth, and frees up CPU resources.
NVIDIA Quantum InfiniBand is a high-performance, low-latency interconnect designed for AI and HPC workloads.
GPUDirect RDMA is a key technology that enables efficient data transfer between GPUs across InfiniBand networks.
With GPUDirect RDMA, data can be directly transferred between GPU memory of different nodes without involving the CPU or system memory.
This direct data path significantly reduces latency and increases bandwidth, enabling scalable multi-GPU and multi-node performance.
GPUDirect technologies work with NVIDIA's accelerated computing platforms, including DGX systems and HGX servers.
GPUDirect Storage enables fast data transfer between storage devices (local NVMe or remote storage over NVMe-oF) and GPU memory in these systems.
It leverages the high-bandwidth, low-latency PCIe topology in NVIDIA's systems to optimise data paths and maximise performance.
GPUDirect technologies also integrate with NVIDIA's software stack, including , , and , enabling developers to take advantage of direct data paths in their applications.
In summary, NVIDIA GPUDirect is a suite of technologies that optimise data movement and access for GPUs, reducing latency, increasing bandwidth, and offloading CPU overhead.
It is a critical component in NVIDIA's accelerated computing stack, enabling high-performance AI, HPC, and data analytics workloads.
GPUDirect RDMA, in particular, works closely with NVIDIA Quantum InfiniBand to provide fast, direct data transfer between GPUs across network nodes, enabling scalable multi-GPU and multi-node performance.
As GPU computing power continues to grow, GPUDirect technologies play an increasingly important role in relieving I/O bottlenecks and enabling efficient data movement in GPU-accelerated systems.