# NVIDIA GB200 NVL72

The NVIDIA <mark style="color:blue;">**DGX GB200 NVL72**</mark> is a powerful <mark style="color:yellow;">**rack-scale system**</mark> built for demanding AI and high-performance computing (HPC) workloads.  It comes with a price tag of about $US3 million.

At the heart of the DGX GB200 NVL72 are <mark style="color:yellow;">**18 compute nodes**</mark>, each housing two <mark style="color:blue;">**Grace-Blackwell Superchips (GB200)**</mark>.&#x20;

The GB200 Superchip is a marvel of engineering, combining a <mark style="color:blue;">**72-core Grace CPU**</mark> with two high-end <mark style="color:blue;">**Blackwell GPUs**</mark> using NVIDIA's ultra-fast <mark style="color:yellow;">**900 GBps**</mark> <mark style="color:blue;">**NVLink-C2C interconnect**</mark>.&#x20;

This tight integration allows for seamless communication between the CPU and GPUs, minimising latency and maximising performance.

The DGX GB200 NVL72 represents a significant advancement in AI and HPC computing, offering unprecedented performance and scalability in a <mark style="color:yellow;">**single rack-scale system**</mark>.&#x20;

However, its high power consumption and cooling requirements may pose challenges for some data centres, potentially limiting its adoption to facilities capable of handling such high-density deployments.

### <mark style="color:purple;">The GPU</mark>

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2Fl9VogsInp2By6qEDBZL3%2Fimage.png?alt=media&#x26;token=7388f3aa-561c-401e-a795-07a5a4afc614" alt="" width="563"><figcaption><p>The Nvidia Blackwell GPU powering the B100, B200, and GB200 accelerators features a pair of reticle limited compute dies which communicate with each other via a 10TB/sec NVLink-HBI interconnect</p></figcaption></figure>

### <mark style="color:purple;">GPU Communication</mark>

The <mark style="color:blue;">**DGX GB200 NVL72**</mark> employs <mark style="color:yellow;">**nine**</mark> [<mark style="color:blue;">**NVLink switch**</mark>](https://training.continuumlabs.ai/infrastructure/servers-and-chips/nvlink-switch) appliances, strategically placed in the middle of the rack.&#x20;

Each switch appliance contains two NVIDIA NVLink 7.2T ASICs[^1], providing a total of <mark style="color:yellow;">144</mark> 100 GBps links.&#x20;

This configuration allows each of the 72 GPUs in the rack to have <mark style="color:yellow;">**1.8 TBps**</mark> (18 links) of bidirectional bandwidth, enabling lightning-fast data transfer and synchronisation between GPUs.&#x20;

The NVLink switches and compute nodes are connected via a blind mate backplane with more than 2 miles (3.2 km) of copper cabling, chosen over optical connections to reduce power consumption by 20 kW.

### <mark style="color:purple;">NVIDIA GB200 Grace Blackwell Superchip</mark>

Each <mark style="color:blue;">**GB200 Superchip**</mark> is equipped with an impressive <mark style="color:yellow;">**864 GB**</mark> of memory, consisting of <mark style="color:yellow;">**480 GB**</mark> <mark style="color:blue;">**LPDDR5x**</mark> for the <mark style="color:blue;">**CPU**</mark> and <mark style="color:yellow;">**384 GB**</mark> <mark style="color:blue;">**HBM3e**</mark> for the <mark style="color:blue;">**GPUs**</mark>.&#x20;

This memory capacity, coupled with the architecture of the Blackwell GPUs, enables each Superchip to deliver an astonishing <mark style="color:yellow;">**40 petaFLOPS**</mark> of sparse <mark style="color:blue;">**FP4**</mark> performance.&#x20;

When all <mark style="color:yellow;">**18**</mark> compute nodes work together, the entire DGX GB200 NVL72 rack can achieve a staggering <mark style="color:yellow;">**1.44 exaFLOPS**</mark> of super-low-precision floating-point performance, making it an ideal platform for AI and HPC workloads.

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2FmFhSR3izTKR5CgfjgUkp%2Fimage.png?alt=media&#x26;token=f4b169d3-7608-481c-b0d7-ae0b435eb2c9" alt="" width="512"><figcaption><p>NVIDIA-GB200-Grace-Blackwell-Superchip</p></figcaption></figure>

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2Fg7J2WWqzbunLtlOeR2qt%2Fimage.png?alt=media&#x26;token=c6c8d4c8-41a0-4616-bca9-ef20e72ff7ca" alt=""><figcaption><p>In total, each Superchip comes equipped with 864GB of memory — 480GB of LPDDR5x and 384GB of HBM3e — and according to Nvidia, can push 40 petaFLOPS of sparse FP4 performance. This means each compute node is capable of producing 80 petaFLOPS of AI compute and the entire rack can do 1.44 exaFLOPS of super-low-precision floating point mathematics.</p></figcaption></figure>

### <mark style="color:purple;">NVIDIA GB200 NVL72</mark>

Here is the <mark style="color:yellow;">**120kW**</mark> flagship system stacked in a single rack.&#x20;

The <mark style="color:blue;">**DGX GB200 NVL72**</mark> weighs <mark style="color:yellow;">**1.36 metric tons**</mark> (3,000 lbs) and consumes a <mark style="color:yellow;">**120kW**</mark>, a power load that not all data centres will be able to handle.&#x20;

As many can only support a maximum of 60kW racks, a future half-stack system seems a possibility.&#x20;

The rack uses 2 miles (3.2 km) of copper cabling instead of optics to lower the system's power draw by 20kW.

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2FfuoDMhSjF9qpJgY8sUGk%2Fimage.png?alt=media&#x26;token=3fde57fc-bfb2-4ae2-94e1-e5740710d4b2" alt=""><figcaption></figcaption></figure>

<mark style="color:green;">**Some statistics:**</mark>

* It's a rack-scale solution that connects <mark style="color:yellow;">**36 Grace CPUs and 72 Blackwell GPUs**</mark>.
* <mark style="color:yellow;">**Liquid-cooled design**</mark> with a 72-GPU [<mark style="color:blue;">**NVLink**</mark>](https://training.continuumlabs.ai/infrastructure/servers-and-chips/nvlink-switch) domain acting as a single massive GPU.
* Delivers <mark style="color:yellow;">30x faster real-time performance for trillion-parameter LLM inference</mark> compared to NVIDIA H100 Tensor Core GPU.
* Enables 4 times faster training for large language models at scale compared to H100.
* Provides 2 times more energy efficiency than H100 air-cooled infrastructure.
* Speeds up key database queries by 18 times compared to CPU, delivering a 5 times better total cost of ownership.

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2Fj3Mz8jscA2cEkJddJAtR%2Fimage.png?alt=media&#x26;token=594eb7fd-8973-4428-855b-2e415ceddfb2" alt="" width="375"><figcaption><p>While the 1.36 metric ton (3,000 lb) rack system is marketed as one big GPU, it's assembled from 18 1U compute nodes, each of which is equipped with two of Nvidia's 2,700W Grace-Blackwell Superchips (GB200)</p></figcaption></figure>

### <mark style="color:purple;">Power and Cooling</mark>

In terms of power consumption, the DGX GB200 NVL72 <mark style="color:blue;">**rack**</mark> consumes <mark style="color:yellow;">120</mark> kW.&#x20;

Each compute node is estimated to consume between 5.4 kW and 5.7 kW, considering the two GB200 Superchips and five NICs[^2].&#x20;

The rack is equipped with six power shelves, three at the top and three at the bottom, to supply the necessary <mark style="color:yellow;">**120 kW**</mark> of power.&#x20;

The power shelves are likely using 415V, 60A PSUs[^3] with some level of redundancy built into the design.

The decision to use copper cabling instead of optical connections was made to reduce the power draw by an additional 20 kW, as the retimers and transceivers required for optics would have added to the already substantial power consumption.

Powering and cooling the DGX GB200 NVL72 is no small feat, given its impressive performance.&#x20;

A hyperscale-style DC bus bar runs down the back of the rack, efficiently distributing power to all components.

To keep the system running at optimal temperatures, the compute nodes and NVLink switches are <mark style="color:blue;">**liquid-cooled**</mark>, with coolant entering the rack at 25°C and exiting 20 degrees warmer. Low-power peripherals, such as NICs and storage, are cooled using conventional 40mm fans.

### <mark style="color:purple;">Scalability</mark>

The DGX GB200 NVL72 is designed to scale, allowing organisations to expand their AI and HPC capabilities as needed.&#x20;

<mark style="color:yellow;">**Eight**</mark> DGX GB200 NVL72 racks can be networked together to form a <mark style="color:blue;">**DGX Superpod**</mark>, housing an impressive <mark style="color:yellow;">**576**</mark> GPUs for tackling even larger training workloads.&#x20;

If more power is required, additional Superpods can be added to the system, providing virtually limitless scalability.

### <mark style="color:purple;">Networking</mark>

Networking and storage are key components of the DGX GB200 NVL72.&#x20;

Each compute node features <mark style="color:yellow;">four</mark> InfiniBand NICs (QSFP-DD) for high-speed, low-latency communication within the compute network.

Additionally, a [<mark style="color:blue;">BlueField-3 DPU</mark>](https://training.continuumlabs.ai/infrastructure/data-and-memory/nvidia-bluefield-data-processing-units-dpus) is included in each node to handle storage network communications efficiently.&#x20;

For local storage, each node is equipped with four small form-factor NVMe storage caddies, providing fast access to data.

### <mark style="color:purple;">Widespread Adoption</mark>

Major organisations across various sectors are expected to adopt Blackwell, including Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI.&#x20;

Cloud service providers like AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will offer Blackwell-powered instances, while server makers such as Cisco, Dell, Hewlett Packard Enterprise, Lenovo, and Supermicro are expected to deliver servers based on Blackwell products.&#x20;

Software makers in engineering simulation, such as Ansys, Cadence, and Synopsys, will also leverage Blackwell-based processors to accelerate their software.

[^1]: ASICs, or Application-Specific Integrated Circuits, are specialised chips designed for a particular use, rather than intended for general-purpose use. ASICs in this context are likely used to manage data transfer efficiently across NVLink

[^2]: **NICs**: Network Interface Cards, which are hardware devices that allow computers to communicate over a computer network.

[^3]: **PSUs**: Power Supply Units, which convert mains AC to low-voltage regulated DC power for the internal components of a computer.
