LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • The Evolution from DGX-1 to DGX-2
  • A Closer Look at the DGX-2
  • Performance and Impact
  • Legacy and Conclusion
  • NVIDIA NVSwitch—Revolutionising AI Network Fabric
  • A comparison between the DXG-2 and the DGX-1
  • System Specifications

Was this helpful?

  1. Infrastructure
  2. Servers and Chips

NVIDIA DGX-2

At the time of its 2019 release, traditional data centre architectures were increasingly unable to cope with the demands of modern AI workloads, which require immense computational power and high-speed interconnects to train increasingly complex models.

This challenge necessitated a paradigm shift towards more scalable and integrated systems.

NVIDIA's response to this challenge was the DGX-2, a system designed to offer unprecedented levels of compute performance and interconnect bandwidth, enabling the training of models that were previously untrainable due to hardware limitations.

Nvidia's DGX-2 stood as a major leap forward. When it was released, it claimed the title of "the world's most powerful AI system for the most complex AI challenges."

The system came with a price tag around $US400,000.

The Evolution from DGX-1 to DGX-2

The DGX-2 expanded on DGX-1 foundation dramatically.

Instead of eight GPUs, it packed 16 GPUs and replaced the NVLink bus with Nvidia’s more scalable NVSwitch technology.

This change allowed the DGX-2 to tackle deep learning and other demanding AI and HPC workloads up to 10 times faster than the DGX-1.

The system was a behemoth, both in terms of size and capability.

It weighed in at 154.2kg (340lbs) and took up 10 rack units, compared to the 3 rack units of the DGX-1.

It required up to 10kW of power, a figure that rose with the introduction of the DGX-2H model, which demanded up to 12kW.

A Closer Look at the DGX-2

Here’s what made the DGX-2 stand out:

  • GPUs: The DGX-2 featured 16 NVIDIA Tesla V100 GPUs. This doubling of GPU capacity, compared to the DGX-1, allowed for unprecedented computational power.

  • Memory and Storage: It came with 1.5 TB of system RAM and 30 TB of high-performance , expandable to 60 TB.

  • Networking: The server was equipped with high-bandwidth network interfaces, including dual 10/25/40/50/100GbE options and up to 8 x 100Gb/sec Infiniband connectivity.

  • CPU: At its core, the DGX-2 had two 24-core Intel Xeon Platinum 8168 processors, providing robust support for the GPUs.

Performance and Impact

The DGX-2’s performance was groundbreaking, delivering 2 petaFLOPS of processing power.

This level of performance meant that the DGX-2 could match the output of 300 dual-socket Xeon servers, which would cost around $2.7 million and occupy significantly more space.

Thus, despite its high upfront cost, the DGX-2 presented a cost-effective solution for intensive AI and HPC workloads.

Legacy and Conclusion

Though alternatives have since emerged, at the time, the DGX-2 represented a pinnacle in AI-focused servers.

It addressed the needs of the most complex AI tasks by dramatically reducing the time and infrastructure required to train deep learning models. Nvidia not only sold a server but also delivered a comprehensive ecosystem that supported the most advanced AI research and applications.

NVIDIA NVSwitch—Revolutionising AI Network Fabric

The introduction of the NVIDIA NVSwitch represented a leap in networking technology, akin to the evolution from dial-up to broadband.

NVSwitch enables a level of model parallelism previously unattainable, providing 2.4TB/s of bisection bandwidth, which is a 24 times increase over previous generations.

This high-performance interconnect fabric allows for unprecedented scaling capabilities, making it possible to train complex models across 16 GPUs efficiently and effectively.

A comparison between the DXG-2 and the DGX-1

Specification
NVIDIA DGX-2
NVIDIA DGX-1

CPUs

2 x Intel Xeon Platinum

2 x Intel Xeon E5-2600 v4

GPUs

16 x NVIDIA Tesla V100, 32GB HBM2 each

8 x NVIDIA Tesla V100, 16 GB HBM2 each

System Memory

Up to 1.5 TB DDR4

Up to 0.5 TB DDR4

GPU Memory

512 GB HBM2 (16 x 32 GB)

256 GB HBM2 (8 x 32 GB)

Storage

30 TB NVMe, expandable up to 60 TB

4 x 1.92 TB NVMe

Networking

8 x Infiniband or 8 x 100 GbE

4 x Infiniband + 2 x 10 GbE

Power

10 kW

3.5 kW

Size

350 lbs

134 lbs

GPU Throughput

Tensor: 1920 TFLOPs, FP16: 480 TFLOPs, FP32: 240 TFLOPs, FP64: 120 TFLOPs

Tensor: 960 TFLOPs, FP16: 240 TFLOPs, FP32: 120 TFLOPs, FP64: 60 TFLOPs

Cost

$399,000

$149,000

System Specifications

Component

Specification

GPUs

16x NVIDIA® Tesla® V100

GPU Memory

512GB total

Performance

2 petaFLOPS

NVIDIA CUDA® Cores

81,920

NVIDIA Tensor Cores

10,240

NVSwitches

12

Maximum Power Usage

10 kW

CPU

Dual Intel Xeon Platinum 8168, 2.7 GHz, 24-cores

System Memory

1.5TB

Network

8x 100Gb/sec Infiniband/100GigE, Dual 10/25/40/50/100GbE

Storage

OS: 2x 960GB NVME SSDs, Internal Storage: 30TB (8x 3.84TB) NVME SSDs

Software

Ubuntu Linux OS, Red Hat Enterprise Linux OS

System Weight

360 lbs (163.29 kgs)

Packaged System Weight

400 lbs (181.44 kgs)

System Dimensions

Height: 17.3 in, Width: 19.0 in, Length: 31.3 in (no bezel), 32.8 in (with bezel)

Operating Temperature Range

5°C to 35°C (41°F to 95°F)

PreviousNVIDIA DGX ServersNextNVIDIA DGX H-100 System

Last updated 11 months ago

Was this helpful?

The new NVSwitches means that the PCIe lanes of the CPUs can be redirected elsewhere, most notably towards storage and networking connectivity
Page cover image