Page cover

NVIDIA DGX-2

At the time of its 2019 release, traditional data centre architectures were increasingly unable to cope with the demands of modern AI workloads, which require immense computational power and high-speed interconnects to train increasingly complex models.

This challenge necessitated a paradigm shift towards more scalable and integrated systems.

NVIDIA's response to this challenge was the DGX-2, a system designed to offer unprecedented levels of compute performance and interconnect bandwidth, enabling the training of models that were previously untrainable due to hardware limitations.

Nvidia's DGX-2 stood as a major leap forward. When it was released, it claimed the title of "the world's most powerful AI system for the most complex AI challenges."

The system came with a price tag around $US400,000.

The Evolution from DGX-1 to DGX-2

The DGX-2 expanded on DGX-1 foundation dramatically.

Instead of eight GPUs, it packed 16 GPUs and replaced the NVLink bus with Nvidia’s more scalable NVSwitch technology.

This change allowed the DGX-2 to tackle deep learning and other demanding AI and HPC workloads up to 10 times faster than the DGX-1.

The system was a behemoth, both in terms of size and capability.

It weighed in at 154.2kg (340lbs) and took up 10 rack units, compared to the 3 rack units of the DGX-1.

It required up to 10kW of power, a figure that rose with the introduction of the DGX-2H model, which demanded up to 12kW.

A Closer Look at the DGX-2

Here’s what made the DGX-2 stand out:

  • GPUs: The DGX-2 featured 16 NVIDIA Tesla V100 GPUs. This doubling of GPU capacity, compared to the DGX-1, allowed for unprecedented computational power.

  • Memory and Storage: It came with 1.5 TB of system RAM and 30 TB of high-performance , expandable to 60 TB.

  • Networking: The server was equipped with high-bandwidth network interfaces, including dual 10/25/40/50/100GbE options and up to 8 x 100Gb/sec Infiniband connectivity.

  • CPU: At its core, the DGX-2 had two 24-core Intel Xeon Platinum 8168 processors, providing robust support for the GPUs.

Performance and Impact

The DGX-2’s performance was groundbreaking, delivering 2 petaFLOPS of processing power.

This level of performance meant that the DGX-2 could match the output of 300 dual-socket Xeon servers, which would cost around $2.7 million and occupy significantly more space.

Thus, despite its high upfront cost, the DGX-2 presented a cost-effective solution for intensive AI and HPC workloads.

Legacy and Conclusion

Though alternatives have since emerged, at the time, the DGX-2 represented a pinnacle in AI-focused servers.

It addressed the needs of the most complex AI tasks by dramatically reducing the time and infrastructure required to train deep learning models. Nvidia not only sold a server but also delivered a comprehensive ecosystem that supported the most advanced AI research and applications.

NVIDIA NVSwitch—Revolutionising AI Network Fabric

The introduction of the NVIDIA NVSwitch represented a leap in networking technology, akin to the evolution from dial-up to broadband.

NVSwitch enables a level of model parallelism previously unattainable, providing 2.4TB/s of bisection bandwidth, which is a 24 times increase over previous generations.

This high-performance interconnect fabric allows for unprecedented scaling capabilities, making it possible to train complex models across 16 GPUs efficiently and effectively.

The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity

A comparison between the DXG-2 and the DGX-1

Specification
NVIDIA DGX-2
NVIDIA DGX-1

CPUs

2 x Intel Xeon Platinum

2 x Intel Xeon E5-2600 v4

GPUs

16 x NVIDIA Tesla V100, 32GB HBM2 each

8 x NVIDIA Tesla V100, 16 GB HBM2 each

System Memory

Up to 1.5 TB DDR4

Up to 0.5 TB DDR4

GPU Memory

512 GB HBM2 (16 x 32 GB)

256 GB HBM2 (8 x 32 GB)

Storage

30 TB NVMe, expandable up to 60 TB

4 x 1.92 TB NVMe

Networking

8 x Infiniband or 8 x 100 GbE

4 x Infiniband + 2 x 10 GbE

Power

10 kW

3.5 kW

Size

350 lbs

134 lbs

GPU Throughput

Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs

Tensor: 960 TFLOPs, FP16: 240 TFLOPs, FP32: 120 TFLOPs, FP64: 60 TFLOPs

Cost

$399,000

$149,000

System Specifications

Component

Specification

GPUs

16x NVIDIA® Tesla® V100

GPU Memory

512GB total

Performance

2 petaFLOPS

NVIDIA CUDA® Cores

81,920

NVIDIA Tensor Cores

10,240

NVSwitches

12

Maximum Power Usage

10 kW

CPU

Dual Intel Xeon Platinum 8168, 2.7 GHz, 24-cores

System Memory

1.5TB

Network

8x 100Gb/sec Infiniband/100GigE, Dual 10/25/40/50/100GbE

Storage

OS: 2x 960GB NVME SSDs, Internal Storage: 30TB (8x 3.84TB) NVME SSDs

Software

Ubuntu Linux OS, Red Hat Enterprise Linux OS

System Weight

360 lbs (163.29 kgs)

Packaged System Weight

400 lbs (181.44 kgs)

System Dimensions

Height: 17.3 in, Width: 19.0 in, Length: 31.3 in (no bezel), 32.8 in (with bezel)

Operating Temperature Range

5°C to 35°C (41°F to 95°F)

Last updated

Was this helpful?