NVIDIA DGX-2
At the time of its 2019 release, traditional data centre architectures were increasingly unable to cope with the demands of modern AI workloads, which require immense computational power and high-speed interconnects to train increasingly complex models.
This challenge necessitated a paradigm shift towards more scalable and integrated systems.
NVIDIA's response to this challenge was the DGX-2, a system designed to offer unprecedented levels of compute performance and interconnect bandwidth, enabling the training of models that were previously untrainable due to hardware limitations.
Nvidia's DGX-2 stood as a major leap forward. When it was released, it claimed the title of "the world's most powerful AI system for the most complex AI challenges."
The system came with a price tag around $US400,000.
The Evolution from DGX-1 to DGX-2
The DGX-2 expanded on DGX-1 foundation dramatically.
Instead of eight GPUs, it packed 16 GPUs and replaced the NVLink bus with Nvidia’s more scalable NVSwitch technology.
This change allowed the DGX-2 to tackle deep learning and other demanding AI and HPC workloads up to 10 times faster than the DGX-1.
The system was a behemoth, both in terms of size and capability.
It weighed in at 154.2kg (340lbs) and took up 10 rack units, compared to the 3 rack units of the DGX-1.
It required up to 10kW of power, a figure that rose with the introduction of the DGX-2H model, which demanded up to 12kW.
A Closer Look at the DGX-2
Here’s what made the DGX-2 stand out:
GPUs: The DGX-2 featured 16 NVIDIA Tesla V100 GPUs. This doubling of GPU capacity, compared to the DGX-1, allowed for unprecedented computational power.
Memory and Storage: It came with 1.5 TB of system RAM and 30 TB of high-performance , expandable to 60 TB.
Networking: The server was equipped with high-bandwidth network interfaces, including dual 10/25/40/50/100GbE options and up to 8 x 100Gb/sec Infiniband connectivity.
CPU: At its core, the DGX-2 had two 24-core Intel Xeon Platinum 8168 processors, providing robust support for the GPUs.
Performance and Impact
The DGX-2’s performance was groundbreaking, delivering 2 petaFLOPS of processing power.
This level of performance meant that the DGX-2 could match the output of 300 dual-socket Xeon servers, which would cost around $2.7 million and occupy significantly more space.
Thus, despite its high upfront cost, the DGX-2 presented a cost-effective solution for intensive AI and HPC workloads.
Legacy and Conclusion
Though alternatives have since emerged, at the time, the DGX-2 represented a pinnacle in AI-focused servers.
It addressed the needs of the most complex AI tasks by dramatically reducing the time and infrastructure required to train deep learning models. Nvidia not only sold a server but also delivered a comprehensive ecosystem that supported the most advanced AI research and applications.
NVIDIA NVSwitch—Revolutionising AI Network Fabric
The introduction of the NVIDIA NVSwitch represented a leap in networking technology, akin to the evolution from dial-up to broadband.
NVSwitch enables a level of model parallelism previously unattainable, providing 2.4TB/s of bisection bandwidth, which is a 24 times increase over previous generations.
This high-performance interconnect fabric allows for unprecedented scaling capabilities, making it possible to train complex models across 16 GPUs efficiently and effectively.
A comparison between the DXG-2 and the DGX-1
Specification | NVIDIA DGX-2 | NVIDIA DGX-1 |
---|---|---|
CPUs | 2 x Intel Xeon Platinum | 2 x Intel Xeon E5-2600 v4 |
GPUs | 16 x NVIDIA Tesla V100, 32GB HBM2 each | 8 x NVIDIA Tesla V100, 16 GB HBM2 each |
System Memory | Up to 1.5 TB DDR4 | Up to 0.5 TB DDR4 |
GPU Memory | 512 GB HBM2 (16 x 32 GB) | 256 GB HBM2 (8 x 32 GB) |
Storage | 30 TB NVMe, expandable up to 60 TB | 4 x 1.92 TB NVMe |
Networking | 8 x Infiniband or 8 x 100 GbE | 4 x Infiniband + 2 x 10 GbE |
Power | 10 kW | 3.5 kW |
Size | 350 lbs | 134 lbs |
GPU Throughput | Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs | Tensor: 960 TFLOPs, FP16: 240 TFLOPs, FP32: 120 TFLOPs, FP64: 60 TFLOPs |
Cost | $399,000 | $149,000 |
System Specifications
Component | Specification |
GPUs | 16x NVIDIA® Tesla® V100 |
GPU Memory | 512GB total |
Performance | 2 petaFLOPS |
NVIDIA CUDA® Cores | 81,920 |
NVIDIA Tensor Cores | 10,240 |
NVSwitches | 12 |
Maximum Power Usage | 10 kW |
CPU | Dual Intel Xeon Platinum 8168, 2.7 GHz, 24-cores |
System Memory | 1.5TB |
Network | 8x 100Gb/sec Infiniband/100GigE, Dual 10/25/40/50/100GbE |
Storage | OS: 2x 960GB NVME SSDs, Internal Storage: 30TB (8x 3.84TB) NVME SSDs |
Software | Ubuntu Linux OS, Red Hat Enterprise Linux OS |
System Weight | 360 lbs (163.29 kgs) |
Packaged System Weight | 400 lbs (181.44 kgs) |
System Dimensions | Height: 17.3 in, Width: 19.0 in, Length: 31.3 in (no bezel), 32.8 in (with bezel) |
Operating Temperature Range | 5°C to 35°C (41°F to 95°F) |
Last updated