NVIDIA Quantum InfiniBand

Networking Solution

NVIDIA Quantum InfiniBand is a high-performance networking solution designed for AI and high-performance computing (HPC) workloads in data centres.

It enables fast, low-latency communication between servers, storage systems, and NVIDIA GPUs.

InfiniBand Architecture

NVIDIA Quantum InfiniBand is based on the InfiniBand networking standard, which provides high bandwidth and low latency.
It uses a switched fabric topology, allowing multiple devices to communicate simultaneously without contention.
The latest generation, NVIDIA Quantum-2, offers speeds up to 400 Gb/s per port.

This speed of 400 Gb/s per port is incredibly fast.

At 400 Gb/s, you could transfer a 100 GB dataset in just 2 seconds. In one minute, you could transfer 3 TB of data, which is equivalent to the storage capacity of a high-end consumer desktop computer.

To put this speed into context, let's compare it with some common networking standards:

Gigabit Ethernet (GbE)

Gigabit Ethernet offers a maximum speed of 1 Gb/s per port. - NVIDIA Quantum InfiniBand's 400 Gb/s speed is 400 times faster than GbE.

10 Gigabit Ethernet (10GbE)

10 Gigabit Ethernet provides speeds up to 10 Gb/s per port. NVIDIA Quantum InfiniBand is 40 times faster than 10GbE.

PCI Express (PCIe) Gen 4

PCIe Gen 4 provides a bandwidth of up to 16 GT/s (GigaTransfers per second) per lane, with a x16 link offering a maximum theoretical bandwidth of 32 GB/s.

NVIDIA Quantum InfiniBand's 400 Gb/s speed is equivalent to around 50 GB/s, exceeding the bandwidth of a PCIe Gen 4 x16 link.

Definition - Switched Fabric Topology

Fabric topology in networking refers to the layout or structure of interconnected nodes, including switches, servers, and storage devices, within a network.

It is designed to support high levels of data transmission and communication efficiency. The term "fabric" comes from the idea of interweaving threads, symbolising the complex and interconnected nature of the network paths.

NVIDIA Quantum InfiniBand uses a switched fabric topology, which means it can easily be scaled up by adding more switches or nodes. It also means the network can ensure continued operation even if a component fails, which is critical for mission-critical applications in data centres.

Some worthwhile reading on InfiniBand

Interconnect for GPUs

NVIDIA Quantum InfiniBand is designed to work with NVIDIA GPUs

It supports NVIDIA GPUDirect, a technology that allows GPUs to directly access the memory of other GPUs or network adapters, reducing latency and improving performance.

With GPUDirect RDMA (Remote Direct Memory Access), GPUs can bypass the CPU and directly access data from remote servers or storage systems over the InfiniBand network.

In-Network Computing

NVIDIA Quantum InfiniBand supports In-Network Computing, which offloads certain computations to the network fabric itself.

It includes preconfigured and programmable compute engines, such as NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARPv3), Message Passing Interface (MPI) Tag Matching, and MPI All-to-All.

These engines accelerate collective operations, reduce network traffic, and improve overall application performance.

Performance Isolation

NVIDIA Quantum InfiniBand provides proactive monitoring and congestion management to ensure performance isolation.

It minimises performance jitter and guarantees predictable performance for applications, as if they were running on dedicated systems.

This is particularly important in multi-tenant environments where multiple users or applications share the same infrastructure.

Performance jitter refers to the variability in computational performance or network latency over time. Factors contributing to performance jitter may include fluctuating network traffic, shared system resources, or varying workloads.

Cloud-Native Supercomputing

NVIDIA Quantum InfiniBand, combined with NVIDIA BlueField Data Processing Units (DPUs), enables cloud-native supercomputing.

It provides bare-metal performance, user management, data protection, and on-demand provisioning of HPC and AI services in a cloud environment.

This allows organisations to leverage the flexibility and scalability of the cloud while maintaining the performance characteristics of dedicated supercomputing systems.

Adapters and Switches

NVIDIA ConnectX-7 InfiniBand adapters, available in various form factors, provide single or dual network ports at 400 Gb/s.
These adapters include advanced In-Network Computing capabilities and programmable engines for data preprocessing and offloading application control paths to the network.
NVIDIA Quantum-2 switches offer high-density, high-bandwidth switching with up to 64 400 Gb/s ports or 128 200 Gb/s ports in a compact 1U form factor.

Cables and Transceivers

NVIDIA Quantum InfiniBand supports a variety of connectivity options, including , , , and .
These options provide flexibility in building network topologies and enable backward compatibility with existing 200 Gb/s or 100 Gb/s infrastructures.

Summary

NVIDIA Quantum InfiniBand is a networking solution that offers extreme performance and efficiency for modern data centres.

As data centres continue to evolve and adopt GPU-accelerated computing and cloud-native architectures, NVIDIA Quantum InfiniBand will play a role in ensuring optimal system performance, scalability, and flexibility.

By investing in this technology, organisations can future-proof their data centre infrastructure and unlock new possibilities for innovation and discovery.

Three applications for NVIDIA Quantum InfiniBand

Real-time, high-resolution video processing in media and entertainment

NVIDIA Quantum InfiniBand could enable distributed, GPU-accelerated processing of high-resolution video content (e.g., 8K or higher) in real-time.

This would allow media and entertainment companies to collaborate on complex video editing, visual effects, and animation projects across multiple locations, with minimal latency and maximum performance.

Federated learning for healthcare and medical research

Quantum InfiniBand could facilitate secure, high-speed data sharing and model training across multiple healthcare institutions or research centres.

This would enable federated learning, where AI models are trained on decentralised data without compromising patient privacy. The low latency and high bandwidth of Quantum InfiniBand would ensure rapid model updates and faster discovery of new medical insights.

Real-time, GPU-accelerated intrusion detection and cybersecurity

NVIDIA Quantum InfiniBand could power distributed, GPU-accelerated intrusion detection systems (IDS) for large-scale networks.

By leveraging GPUs and high-speed, low-latency networking, these systems could analyse massive amounts of network traffic in real-time, detecting and responding to potential security threats with unprecedented speed and accuracy.

This would help organisations to better protect their critical assets and data from increasingly sophisticated cyber attacks.

PreviousInfiniband versus Ethernet NextPCIe (Peripheral Component Interconnect Express)

Last updated 1 year ago

Was this helpful?