LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Calculating PUE
  • Importance of PUE
  • Data Centre Infrastructure Efficiency (DCiE)
  • Carbon Usage Effectiveness (CUE)
  • Measuring and Reporting CUE
  • Floating-Point Operations Per Second (FLOPS)
  • Calculating CPU Performance
  • Application-specific benchmarks
  • Comprehensive Resource Efficiency Framework (CREF)
  • Data Centre Output Efficiency (DCOE)
  • Concerns around data centres
  • Some data centre trends
  • Data Centre Challenges

Was this helpful?

  1. INFERENCE
  2. Why is inference important?

Is PUE a useful measure of data centre performance?

Power Usage Effectiveness (PUE) is a critical metric for measuring the energy efficiency of data centres.

Introduced in 2007 by The Green Grid, PUE has become a global standard for assessing and improving data centre energy consumption.

Calculating PUE

To calculate PUE, you need two key pieces of information:

  1. IT Load: The energy consumed by IT equipment, typically measured from power distribution units (PDUs).

  2. Total Facility Energy Consumption: This includes energy used by network equipment, cooling systems, lighting, and uninterruptible power supplies (UPS), usually measured from the utility meter.

The formula for PUE is:

PUE=Total Facility Energy ConsumptionIT Load \text{PUE} = \frac{\text{Total Facility Energy Consumption}}{\text{IT Load}} PUE=IT LoadTotal Facility Energy Consumption​

For example, if a data centre uses 50,000 kWh of total energy and 40,000 kWh is consumed by IT equipment, the PUE would be:

PUE=50,000 kWh40,000 kWh=1.25 \text{PUE} = \frac{50,000 \text{ kWh}}{40,000 \text{ kWh}} = 1.25 PUE=40,000 kWh50,000 kWh​=1.25

Importance of PUE

PUE helps data centres benchmark their energy use over time, enabling them to track improvements and identify areas for further optimisation.

A lower PUE indicates higher energy efficiency, with a PUE of 1 being ideal.

Data Centre Infrastructure Efficiency (DCiE)

DCiE is another metric that uses the same data as PUE but expresses it as a percentage. The formula for DCiE is:

DCiE=IT LoadTotal Facility Energy Consumption×100 \text{DCiE} = \frac{\text{IT Load}}{\text{Total Facility Energy Consumption}} \times 100 DCiE=Total Facility Energy ConsumptionIT Load​×100

Using the previous example:

DCiE=40,000 kWh50,000 kWh×100=80% \text{DCiE} = \frac{40,000 \text{ kWh}}{50,000 \text{ kWh}} \times 100 = 80\% DCiE=50,000 kWh40,000 kWh​×100=80%

Managing Costs and Reducing PUE

By regularly measuring PUE, data centres can identify inefficiencies and track their progress in reducing energy consumption.

Strategies to lower PUE include:

  • Cold Aisle Containment: Improves cooling efficiency.

  • Enhanced Cooling Technology: Optimises airflow and cooling systems.

  • Small Improvements: Use advanced power supplies, automatic lighting, and eliminate waste.

Why Reducing PUE Matters

Reducing PUE is crucial for making data centres more economical and environmentally friendly.

Efficient energy use reduces costs, lowers emissions, and enhances overall performance, offering a competitive advantage over less efficient data centres.

Carbon Usage Effectiveness (CUE)

Carbon Usage Effectiveness (CUE) is a metric that quantifies the carbon footprint of a data centre by measuring the amount of carbon dioxide (CO2) emissions generated per unit of IT energy consumed.

It provides a clear picture of the environmental impact of data centre operations and complements the Power Usage Effectiveness (PUE) metric, which focuses on energy efficiency.

Calculating CUE: The CUE is calculated using the following formula:

CUE = Total CO2 emissions (kg) / Total IT Energy (kWh)

To determine the total CO2 emissions, data centres need to consider the carbon emission factors of their energy sources.

These factors indicate the amount of CO2 emitted per unit of energy produced and vary depending on the type of energy source (e.g., coal, gas, oil, or renewable). Data centres can obtain these factors from public databases or their utility companies.

The total IT energy represents the energy consumed by the IT equipment within the data centre, such as servers, storage devices, and network equipment. This information can be obtained from power distribution units (PDUs) or other energy monitoring systems.

Reducing CUE through Renewable Energy Purchase

One effective way for data centres to reduce their CUE is by purchasing renewable energy.

Renewable energy sources, such as solar, wind, or hydro power, have significantly lower carbon emission factors compared to fossil fuels.

By sourcing a portion or all of their energy from renewable sources, data centres can dramatically decrease their CO2 emissions and, consequently, their CUE.

There are several ways data centres can acquire renewable energy:

  1. Power Purchase Agreements (PPAs): Data centres can enter into long-term contracts with renewable energy developers to purchase a specific amount of energy at a fixed price. This approach provides a stable and predictable energy supply while supporting the development of new renewable energy projects.

  2. Renewable Energy Certificates (RECs): Data centres can purchase RECs, which represent the environmental attributes of one megawatt-hour (MWh) of renewable energy generation. By buying RECs, data centres can claim the use of renewable energy and offset their carbon emissions, even if they don't have direct access to renewable energy sources.

  3. On-site Renewable Energy Generation: Data centres can install their own renewable energy systems, such as solar panels or wind turbines, to generate clean energy on-site. This approach reduces reliance on the grid and can provide long-term cost savings.

Measuring and Reporting CUE

To accurately measure CUE, data centres need to have energy monitoring and carbon accounting systems in place.

These systems should track energy consumption at the IT equipment level and monitor the carbon emission factors of the energy sources used.

Data centres should regularly report their CUE to stakeholders, including customers, investors, and regulators. Transparent reporting of CUE helps demonstrate a data centre's commitment to sustainability and allows for benchmarking against industry peers.

In addition to CUE, data centres should also consider reporting other sustainability metrics, such as their renewable energy usage, carbon emissions reduction targets, and progress towards those targets.

By adopting the CUE metric and actively working to reduce it through renewable energy procurement and other sustainability initiatives, data centres can play a crucial role in mitigating climate change and contributing to a more sustainable future.

Floating-Point Operations Per Second (FLOPS)

FLOPS is a unit of measurement used to quantify the computing power of a computer or a processor. It measures the number of floating-point calculations that can be performed in one second.

Importance of FLOPS in Technology

FLOPS helps determine a system's computational performance. It allows for comparing the speed and efficiency of different computers and processors when handling complex mathematical calculations, simulations, graphics rendering, and machine learning algorithms.

Floating-Point Operations

Floating-point operations refer to mathematical calculations involving decimal numbers with a fractional part.

These operations include addition, subtraction, multiplication, and division of floating-point numbers. They are commonly used in scientific computing, simulations, and other applications that require precise numerical calculations.

Calculation of FLOPS

FLOPS is calculated by multiplying the number of floating-point operations performed per second by the number of operations per instruction and dividing it by the execution time. This calculation gives an idea of how fast a computer or processor can perform these operations.

Types of FLOPS

There are two types of FLOPS: theoretical FLOPS and measured FLOPS. Theoretical FLOPS refers to the maximum number of FLOPS a computer or processor can achieve based on its architecture and specifications. Measured FLOPS represents the actual computational performance observed during real-world applications.

Measuring FLOPS

FLOPS are typically measured using benchmarking software. These programs run a series of standardised mathematical simulations and record the time taken to complete them.

By comparing the execution time with the number of floating-point operations performed, the FLOPS value can be calculated.

Difference Between FLOPS and MIPS

FLOPS measures the computational performance of a computer or processor in terms of floating-point operations, while millions of instructions per second (MIPS) measures the processing speed in terms of the number of instructions executed per second.

FLOPS focuses on numerical calculations, while MIPS covers a broader range of instructions, including both arithmetic and logical operations.

Relationship Between FLOPS and CPU Clock Speed

The relationship between FLOPS and CPU clock speed is not direct.

While a higher CPU clock speed can potentially lead to more FLOPS, it is not the sole determining factor. Other factors such as the architecture, instruction set, and efficiency of the processor also play a significant role in determining its FLOPS capability.

FLOPS and Gaming

FLOPS has a direct impact on gaming performance, especially in rendering realistic graphics and physics simulations. Games that require complex visual effects and physics calculations rely on the FLOPS capability of the graphics processing unit (GPU) to deliver smooth and immersive gameplay.

Calculating CPU Performance

To determine the potential performance of a CPU-based system, you can consider several factors and benchmarks. Here are some key aspects to evaluate:

  1. Clock speed: The clock speed, measured in GHz, represents the number of cycles the CPU can execute per second. A higher clock speed generally indicates faster performance, but it's not the only factor to consider.

  2. Number of cores and threads: Modern CPUs have multiple cores, allowing them to execute multiple tasks simultaneously. Some CPUs also support hyperthreading, which allows each core to handle two threads concurrently. More cores and threads can lead to better performance, especially in multi-threaded applications.

  3. Instructions per clock (IPC): IPC represents the average number of instructions a CPU can execute per clock cycle. A higher IPC indicates better performance, as the CPU can do more work in each cycle.

  4. Cache size and hierarchy: CPUs have various levels of cache (L1, L2, L3) that store frequently accessed data. Larger cache sizes and more efficient cache hierarchies can improve performance by reducing the time spent accessing main memory.

  5. Memory bandwidth: The speed and bandwidth of the system's memory can significantly impact performance, especially for memory-intensive workloads.

Application-specific performance: The performance of a CPU can vary depending on the specific application or workload. It's important to consider the performance of the CPU in the context of the intended use case.

Power consumption and thermal efficiency: The power consumption and thermal efficiency of a CPU can impact its performance, especially in systems with limited cooling or power budgets.

Application-specific benchmarks

In addition to the component-level metrics, it's important to consider application-specific benchmarks that represent the typical workloads run on the HPC system.

These benchmarks can provide a more realistic assessment of the system's performance for its intended use cases.

Scalability: When evaluating an HPC system, it's crucial to consider its scalability, i.e., how well the performance scales as the problem size or the number of nodes increases. Metrics like parallel efficiency and speedup can help assess the system's scalability.

I/O performance: In addition to storage access speed, it's important to consider the I/O performance of the system as a whole, including the file system and any parallel I/O libraries used. Metrics like I/O bandwidth and I/O operations per second (IOPS) can help assess the I/O performance.

Compiler optimisations: The performance of the CPU and GPU can be significantly influenced by the compiler optimisations used. It's important to consider the available compilers and their optimisation capabilities when assessing the system's performance.

Interconnect topology: The topology of the interconnect network, such as fat-tree, torus, or dragonfly, can have a significant impact on the communication performance of the system, especially for larger-scale systems. It's important to consider the topology and its suitability for the intended workloads.

Cooling and power efficiency: As you mentioned, efficient power supply and cooling are crucial for maintaining high performance and reliability. Metrics like power usage effectiveness (PUE) and cooling efficiency can help assess the system's energy efficiency.

Reliability and availability: In addition to performance, it's important to consider the reliability and availability of the system, especially for long-running or mission-critical workloads. Metrics like mean time between failures (MTBF) and system uptime can help assess the system's reliability and availability.

Comprehensive Resource Efficiency Framework (CREF)

To create a new standard for assessing the resource usage and efficiency of data centres, we can develop a multi-factor model that goes beyond the traditional Power Usage Effectiveness (PUE) metric.

This new framework considers various aspects of data centre operations, including power consumption, carbon intensity, water usage, and waste generation.

The CREF model consists of the following components:

Power Consumption Efficiency (PCE)

  • PCE = Total IT Equipment Power / Total Facility Power

  • This is the typical metric measures the efficiency of power distribution within the data centre, similar to the traditional PUE.

  • A lower PCE value indicates better efficiency, with a theoretical ideal of 1.

Carbon Intensity Factor (CIF)

  • CIF = (Carbon Emissions from Power Consumption) / (Total IT Equipment Power)

  • The CIF measures the carbon footprint of the data centre based on the source of its power consumption.

  • It takes into account the carbon emissions associated with the generation of the electricity used by the data centre.

  • A lower CIF value indicates a more environmentally friendly data centre, with a theoretical ideal of 0.

Water Usage Effectiveness (WUE)

  • WUE = (Total Water Consumption) / (Total IT Equipment Power)

  • The WUE metric quantifies the water consumed by the data centre for cooling and other purposes, relative to the power consumed by the IT equipment.

  • It is expressed in litters per kilowatt-hour (L/kWh) of IT equipment power.

  • A lower WUE value indicates more efficient water usage, with a theoretical ideal of 0.

Waste Recycling Ratio (WRR)

  • WRR = (Amount of Waste Recycled) / (Total Waste Generated)

  • The WRR measures the proportion of waste generated by the data centre that is recycled or reused.

  • It includes electronic waste, packaging materials, and other waste streams.

  • A higher WRR value indicates better waste management practices, with a theoretical ideal of 1.

Renewable Energy Utilisation (REU)

  • REU = (Renewable Energy Consumed) / (Total Energy Consumed)

  • The REU metric represents the proportion of the data centre's total energy consumption that comes from renewable sources.

  • It encourages the adoption of clean energy and reduces the carbon footprint of the data centre.

  • A higher REU value indicates a more sustainable data centre, with a theoretical ideal of 1.

The CREF model combines these metrics to provide a comprehensive assessment of a data centre's resource efficiency and environmental impact. The overall CREF score can be calculated as follows:

CREF Score = (PCE * Wp) + (CIF * Wc) + (WUE * Ww) + (WRR * Wr) + (REU * We)

Where:

  • Wp, Wc, Ww, Wr, and We are weighting factors that can be adjusted based on the relative importance of each component.

  • The sum of the weighting factors should equal 1.

By using the CREF model, data centre operators, policymakers, and stakeholders can assess the resource efficiency of data centres more holistically.

This framework encourages data centre operators to optimise their facilities across multiple dimensions, leading to more sustainable and environmentally friendly practices.

To implement the CREF model effectively, data centre operators would need to regularly monitor and report on these metrics, and there should be industry-wide standards and guidelines for measuring and verifying the data.

Additionally, policymakers and industry organisations can use the CREF model to set benchmarks, establish best practices, and create incentives for data centres to improve their resource efficiency and reduce their environmental impact.

By regularly monitoring and tracking these metrics over time, data centre operators can assess the performance of their racks, identify bottlenecks or inefficiencies, and make informed decisions about optimisations or upgrades.

Data Centre Output Efficiency (DCOE)

CPU Performance

  • Measure the number of instructions per second (IPS) executed by the server's CPUs.

  • This can be obtained using performance monitoring tools or by running standardised CPU benchmarks.

  • Higher IPS indicates that the server is processing more instructions and doing more computational work.

GPU Performance (for servers with GPUs)

  • Measure the number of floating-point operations per second (FLOPS) performed by the server's GPUs.

  • This can be obtained using GPU-specific benchmarks or performance monitoring tools.

  • Higher FLOPS indicates that the server is performing more complex mathematical operations, which is particularly relevant for AI, scientific simulations, and other GPU-accelerated workloads.

Memory Throughput

  • Measure the amount of data transferred between the CPU/GPU and memory per second (in bytes/second).

  • This can be obtained using memory bandwidth benchmarks or performance monitoring tools.

  • Higher memory throughput suggests that the server is efficiently moving data to and from memory, which is essential for data-intensive workloads.

Network Throughput

  • Measure the amount of data transmitted and received by the server's network interfaces per second (in bits/second or bytes/second).

  • This can be obtained using network monitoring tools or by measuring the throughput of network-intensive workloads.

  • Higher network throughput indicates that the server is efficiently communicating with other servers or clients, which is important for distributed computing and data-intensive applications.

Storage Throughput (for servers with local storage)

  • Measure the amount of data read from and written to the server's local storage per second (in bytes/second).

  • This can be obtained using storage benchmarks or by measuring the throughput of I/O-intensive workloads.

  • Higher storage throughput suggests that the server is efficiently accessing and manipulating data on its local storage, which is relevant for data processing and storage-intensive applications.

To create a simplified server output metric, we can combine these individual performance metrics into a single, normalized score. Here's an example formula:

This simplified Server Output Score provides a standardized measure of a server's overall performance and work output based on its CPU, GPU, memory, network, and storage capabilities.

By comparing the Server Output Scores of different servers or tracking the score of a server over time, data center operators can assess the relative performance and efficiency of their servers and make informed decisions about resource allocation, upgrades, and optimizations.

Keep in mind that this simplified metric may not capture all the nuances and complexities of server performance, but it offers a more practical and accessible approach to evaluating server output compared to the previous, more comprehensive benchmark.

Concerns around data centres

Exponential growth and energy consumption

  • Data centre energy consumption is growing exponentially, which is a cause for concern.

  • If the current trend continues, data centre energy consumption could double every 12 years.

  • Exponential growth can be dangerous as it can quickly hit limits, such as resource availability or competing needs.

Resource conflicts and environmental impact

  • Data centres compete for resources, such as energy and water, with other sectors like housing, agriculture, and food production.

  • In some regions, data centres consume a significant portion of the total energy, leading to conflicts with local communities and industries.

  • Data centres' water consumption is also substantial, comparable to that of hospitals, golf courses, or medium-sized cities.

  • Environmental opposition movements have emerged in areas where data centres strain local resources, such as Virginia, Ireland, the UK, and the Netherlands.

Power Usage Effectiveness (PUE) and its limitations

  • PUE is a metric used to measure data centre efficiency, calculated as the ratio of total energy consumption to the energy used for computation.

  • A PUE of 1 indicates a 100% efficient data centre, but this is not achievable in practice.

  • Companies often report favourable PUE values for marketing purposes, but these should be viewed with caution.

  • PUE does not account for the environmental impact or the source of energy used.

Water usage for cooling

  • Data centres typically require 1.5 to 2.3 litres of water per kilowatt-hour of energy for cooling.

  • Cooling is a significant contributor to data centre inefficiency, as the heat generated is often wasted.

  • Some data centres are exploring closed-loop water cooling systems to reduce water consumption, but these are not as efficient as evaporative cooling.

Energy consumption of AI and high-performance computing

  • The rise of artificial intelligence (AI) and machine learning has led to increased energy consumption in data centres.

  • Training a single AI model can consume as much energy as a car does in its entire lifetime.

  • High-performance computing, which often relies on power-hungry GPUs and specialized processors, contributes significantly to data centre energy consumption.

Jevons paradox and the rebound effect

  • Jevons paradox suggests that as technology becomes more efficient, consumption increases disproportionately.

  • In the context of data centres, as computing becomes cheaper and more accessible, overall energy consumption may increase despite efficiency improvements.

Lack of transparency and regulation

  • Data centre operators are not required to disclose their energy consumption or efficiency metrics, making it difficult to assess their true environmental impact.

  • Regulations and incentives for data centre efficiency and sustainability are limited or have been weakened by lobbying efforts.

Potential solutions and best practices

  • Integrating data centres with local heating systems to utilise waste heat for district heating or industrial processes.

  • Developing modular, containerised data centres that can be easily integrated into local energy systems.

  • Exploring innovative cooling solutions, such as liquid cooling or using waste heat for cooling via absorption chillers.

  • Encouraging software efficiency and optimizing resource allocation to minimize energy consumption.

  • Implementing stricter regulations and incentives for data centre efficiency and sustainability.

Some data centre trends

  1. Data centres have experienced double-digit growth over the last 15 years, driven by the increasing demand for technology and the outsourcing of IT infrastructure by enterprises.

  2. The rise of public cloud has been a significant accelerator for multi-tenant, third-party data centres, as even the hyperscalers like Amazon, Microsoft, Google, and Oracle outsource around 50% of their data centre capacity.

  3. The emergence of generative AI, such as ChatGPT, has created an unprecedented demand for data centre capacity in the last 6 months, requiring new types of processing based on GPUs rather than CPUs.

  4. GPU-based infrastructure is more expensive than traditional CPU-based infrastructure, with higher costs for the hardware, networking (using InfiniBand instead of Ethernet), and power consumption (10-100 kilowatts per rack compared to 5-15 kilowatts for CPUs).

  5. There are concerns about the availability of power to support the growing demand for data centres, with some regions like Ashburn, Virginia, experiencing shortages. This is pushing demand to other markets across the United States.

  6. Globally, data centre development is growing rapidly, with South America, Europe, Asia, the Middle East, and parts of Africa seeing significant absorption. Countries without access to advanced chips and data centre capacity may fall behind economically if generative AI becomes a major driver of economies.

  7. Cooling is a significant portion of data centre power consumption, with the efficiency measured by Power Usage Effectiveness (PUE). Modern data centres have improved their PUE from around 2 to 1.2-1.3 through advanced cooling technologies and free cooling in colder climates.

  8. While AI workloads currently make up a small percentage of total data centre capacity, this is expected to grow significantly in the coming years, potentially cannibalizing existing workloads as the technology advances.

Data Centre Challenges

Data centres that already have a limited power supply will face significant challenges in accommodating the massive power requirements of AI GPUs, which can require up to 200 kW per rack.

Cooling these high-density racks will also be a major hurdle.

Here are a few strategies data centres might employ to address these issues:

  1. Power infrastructure upgrades: Data centres may need to invest in upgrading their power infrastructure, including transformers, switchgear, and power distribution units (PDUs), to handle the increased power demands. This could involve working with utility companies to secure additional power capacity.

  2. Liquid cooling: Traditional air cooling methods may not be sufficient for high-density AI GPU racks. Data centres may need to implement liquid cooling solutions, such as direct-to-chip liquid cooling or immersion cooling, which can more effectively remove heat from the hardware. Liquid cooling can also help reduce the overall power consumption associated with cooling.

  3. Modular and phased deployments: Data centres may choose to deploy AI GPU infrastructure in modular or phased approaches, gradually adding capacity as power and cooling infrastructure is upgraded. This can help spread out the capital expenditure and avoid overloading existing power and cooling systems.

  4. Workload optimisation and scheduling: Data centres can work with their customers to optimise and schedule AI workloads to make the most efficient use of available power and cooling resources. This may involve running certain workloads during off-peak hours or balancing workloads across different data centre locations.

  5. Power usage effectiveness (PUE) improvements: Data centres can strive to improve their overall PUE by implementing more efficient cooling systems, such as free cooling in colder climates, and optimising airflow management within the facility. Improving PUE can help free up more power capacity for the actual IT equipment.

  6. Collaborative planning with customers: Data centres will need to work closely with their customers who are looking to deploy AI GPU infrastructure to understand their specific requirements and develop customised solutions. This may involve exploring alternative data centre locations with more abundant power resources or developing long-term plans for power and cooling infrastructure upgrades.

  7. Renewable energy integration: Data centres can explore the integration of renewable energy sources, such as solar or wind power, to supplement their power capacity. While renewable energy alone may not be sufficient to power high-density AI GPU racks, it can help offset some of the increased power demand.

Despite these strategies, the power and cooling challenges posed by AI GPUs will likely limit the ability of some data centres to fully accommodate this new infrastructure.

In many cases, data centers may need to make significant capital investments and infrastructure upgrades to support the growing demand for AI computing power. This may also drive the development of new, purpose-built data centres specifically designed for AI workloads, with ample power and cooling capacity from the outset.

PreviousFiDO: Fusion-in-Decoder optimised for stronger performance and faster inferenceNextSLORA

Last updated 12 months ago

Was this helpful?

Measuring the Energy Consumption and Efficiency of Deep Neural Networks: An Empirical Analysis and Design Recommendations
Logo