NVIDIA Magnum IO GPUDirect Storage (GDS)
NVIDIA Magnum IO GPUDirect Storage (GDS)
NVIDIA Magnum IO GPUDirect Storage (GDS) is a technology designed to accelerate data transfers between GPU memory and remote or local storage by avoiding CPU bottlenecks.
Here are the key technical details:
Data Path
GDS creates a direct data path between local NVMe or remote storage and GPU memory.
This is enabled via a engine near the network adapter or storage that transfers data into or out of GPU memory, bypassing the bounce buffer in the CPU.
Traditional reads and writes to GPU memory use POSIX APIs to read/write data from system memory as an intermediate , which can cause IO bottlenecks.
Components and Integration
GDS is exposed within CUDA via the cuFile API.
The cuFile API is integrated into the CUDA Toolkit (version 11.4 and later) or delivered via a separate package containing a user-level library (libcufile) and kernel module (nvidia-fs).
The user-level library is integrated into the CUDA Toolkit runtime, and the kernel module is installed with the NVIDIA driver.
NVIDIA Mellanox OFED (MLNX_OFED) is required and must be installed prior to GDS installation.
Supported Technologies
GDS supports RDMA over InfiniBand and Ethernet RoCE.
It supports distributed file systems such as NFS, DDN EXAScaler, WekaIO, and IBM Spectrum Scale.
GDS supports storage protocols via NVMe and NVMe-oF.
It provides a compatibility mode for non-GDS ready platforms.
GDS is enabled on NVIDIA DGX Base OS and supports Ubuntu and RHEL operating systems.
Integration with Libraries, APIs, and Frameworks
GDS can be used with multiple libraries, APIs, and frameworks, including DALI (Data Loading Library), RAPIDS cuDF, PyTorch, and MXNet.
Performance Benefits
Higher Bandwidth: GDS achieves up to 2X more bandwidth available to the GPU compared to a standard CPU-to-CPU path.
Lower Latency: By avoiding extra copies in the host system memory and providing dynamic routing, GDS optimises path, buffers, and mechanisms, resulting in lower latency.
Reduced CPU Utilisation: The use of DMA engines near storage is less invasive to CPU load and doesn't interfere with GPU load. At larger data sizes, the ratio of bandwidth to fractional CPU utilisation is much higher with GDS.
Benchmarking Results
GDSIO Benchmark: Up to 1.5X improvement in bandwidth available to the GPU and up to 2.8X improvement in CPU utilization compared to traditional data paths via the CPU bounce buffer.
DeepCAM Benchmark: When optimised with GDS and NVIDIA DALI, DeepCAM (a deep learning model for climate simulations) can achieve up to a 6.6X speedup compared to out-of-the-box NumPy.
In summary, NVIDIA Magnum IO GPUDirect Storage is a technology that enables direct data transfer between GPU memory and storage, bypassing the CPU.
This results in higher bandwidth, lower latency, and reduced CPU utilisation, leading to improved performance for GPU-accelerated workflows in HPC, AI, and data analytics.
Last updated