LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Introduction to Vector Similarity Search
  • Approximate Nearest Neighbor (ANN) Algorithms
  • Understanding HNSW's Core
  • Graph Construction and Parameter Tuning in HNSW
  • Hierarchical Navigable Small World (HNSW) benefits
  • Implementing HNSW with FAISS
  • The Impact of Parameters on Performance
  • Enhancing HNSW's Performance
  • Product Quantization (PQ) and Inverted File System (IVF)
  • Using IVF with HNSW in FAISS
  • Conclusion

Was this helpful?

  1. KNOWLEDGE
  2. Vector Databases

Vector Similarity Search - HNSW

PreviousPrincipal Component Analysis (PCA)NextFAISS (Facebook AI Similarity Search)

Last updated 1 year ago

Was this helpful?

Introduction to Vector Similarity Search

Vector similarity search is a fundamental problem in computer science, with applications spanning across various domains such as information retrieval, recommendation systems, image and video search, and machine learning.

The goal of vector similarity search is to find the most similar vectors to a given query vector from a large collection of vectors, based on a similarity metric such as Euclidean distance or cosine similarity.

In many real-world applications, the number of vectors in the collection can be in the millions or billions, and the dimensionality of the vectors can be high (hundreds or thousands).

This makes exact similarity search computationally expensive and often infeasible.

Therefore, there is a need for efficient algorithms that can perform approximate nearest neighbor (ANN) search, trading off a small amount of accuracy for significant speedups.

Hierarchical Navigable Small World (HNSW) graphs have emerged as a leading technology in vector similarity searches, acclaimed for unparalleled speed and accuracy.

Approximate Nearest Neighbor (ANN) Algorithms

ANN algorithms are a class of algorithms that aim to find the approximate nearest neighbours of a query vector in a large collection of vectors.

The key idea behind ANN algorithms is to build an index structure that allows for fast searching, while sacrificing some accuracy compared to exact search methods.

There are several categories of ANN algorithms, each with its own strengths and weaknesses:

Tree-based methods: These algorithms build a tree-like index structure to partition the vector space and perform a branch-and-bound search. Examples include KD-trees, ball trees, and VP-trees.

Hashing-based methods: These algorithms use locality-sensitive hashing (LSH) functions to map similar vectors to the same hash buckets, allowing for fast retrieval of candidate neighbours. Examples include LSH Forest and Multi-Probe LSH.

Graph-based methods: These algorithms build a graph structure where each node represents a vector, and edges connect similar vectors. The search is performed by traversing the graph starting from a set of initial nodes. Examples include Hierarchical Navigable Small World (HNSW) and Proximity Graph.

Understanding HNSW's Core

Proximity Graphs

  • HNSW belongs to the class of proximity graphs, where each node represents a vector, and edges connect similar vectors based on a similarity metric (e.g., Euclidean distance).

  • The key idea is that nearby vectors in the graph are likely to be the nearest neighbours of a query vector.

Probabilistic Skip Lists

  • Probabilistic skip lists are a data structure that allows for efficient searching and insertion in a sorted list.

  • The main idea is to create multiple layers of linked lists, with each layer skipping over some elements in the layer below.

  • This allows for fast searching by starting at the top layer and moving down to lower layers when necessary.

Navigable Small World Graphs

  • Navigable small world graphs are a type of graph structure that exhibits both local connectivity and long-range links.

  • The local connectivity ensures that similar vectors are connected, while the long-range links allow for efficient navigation between distant parts of the graph.

  • The search is performed by starting from a set of entry nodes and greedily traversing the graph towards the query vector.

Hierarchical NSW (HNSW) combines the ideas of proximity graphs, probabilistic skip lists, and navigable small world graphs to create a multi-layer graph structure that allows for efficient and scalable approximate nearest neighbor search.

To illustrate how they all work together, consider the following analogy:

Imagine you are in a large library, and you want to find a specific book (the query vector). The books are organised on shelves based on their similarity (proximity graph).

To quickly locate the book, you start at the library's entrance and follow signs that direct you to the most relevant section (navigable small world graph). Once you reach the section, you can efficiently search for the book by skipping some shelves and only examining the most promising ones (probabilistic skip list). By combining these strategies, you can find the desired book much faster than searching through every shelf in the library.

Similarly, HNSW combines these concepts to create a multi-layer graph structure that allows for efficient navigation and search in high-dimensional vector spaces.

Graph Construction and Parameter Tuning in HNSW

The graph construction process in HNSW is governed by a probability function that determines the insertion layer for each new vector.

The probability of inserting a vector at layer l is given by:

P(l)=1/(mLl)P(l) = 1 / (m_L ^ l)P(l)=1/(mLl​)

where mLm_LmL​is the level multiplier, a hyperparameter that controls the growth rate of the graph.

The insertion process works as follows:

  1. Randomly select a layer lll based on the probability function vvv.

  2. Find the nearest neighbours of the new vector at layer lll using a greedy search.

  3. Create bi-directional edges between the new vector and its nearest neighbours.

  4. Repeat steps 1-3 for all layers up to lll.

The graph construction process is affected by several key parameters:

  • mLm_LmL​: The level multiplier that controls the growth rate of the graph. A larger mLm_LmL​ leads to a more compact graph with fewer layers, while a smaller mLm_LmL​ leads to a taller graph with more layers.

  • MMM: The maximum number of outgoing edges per node. A larger MMM leads to a more connected graph, which can improve recall but also increases memory usage.

  • efConstruction: The size of the dynamic candidate list during graph construction. A larger efConstruction leads to a more thorough search for nearest neighbours, which can improve the quality of the graph but also increases construction time.

Graph Construction and Parameter Tuning in HNSW

Let's consider a small example to demonstrate how the probability function affects the distribution of vectors across different layers and how this impacts the graph's performance.

Suppose we have a dataset of 10 vectors and set the level multiplier (mL)(m_L) (mL​)to 2. The probability of a vector being inserted at layer lll is given by:

P(l)=1/(2l)P(l) = 1 / (2^l)P(l)=1/(2l)

Layer 0: P(0)=1/(20)=1 P(0) = 1 / (2^0) = 1P(0)=1/(20)=1

Layer 1: P(1)=1/(21)=0.5P(1) = 1 / (2^1) = 0.5 P(1)=1/(21)=0.5

Layer 2: P(2)=1/(22)=0.25P(2) = 1 / (2^2) = 0.25P(2)=1/(22)=0.25

In this case:

  1. all 10 vectors will be inserted at layer 0

  2. 5 vectors (on average) will be inserted at layer 1

  3. and 2-3 vectors (on average) will be inserted at layer 2.

This distribution of vectors across layers creates a hierarchical structure that allows for efficient search.

If we increase mLm_LmL​ to 4, the probability distribution changes:

Layer 0: P(0)=1/(40)=1P(0) = 1 / (4^0) = 1 P(0)=1/(40)=1

Layer 1: P(1)=1/(41)=0.25P(1) = 1 / (4^1) = 0.25P(1)=1/(41)=0.25

Layer 2: P(2)=1/(42)=0.0625P(2) = 1 / (4^2) = 0.0625P(2)=1/(42)=0.0625

With mLm_LmL​ = 4

  1. all 10 vectors will still be inserted at layer 0

  2. but only 2-3 vectors (on average) will be inserted at layer 1

  3. and likely no vectors will be inserted at layer 2.

This results in a flatter graph with fewer layers, which can reduce search time but may also reduce recall.

Hierarchical Navigable Small World (HNSW) benefits

The Hierarchical Navigable Small World (HNSW) algorithm offers several benefits over the basic Navigable Small World (NSW) approach, addressing some of its limitations and improving the state-of-the-art in K-Approximate Nearest Neighbor (K-ANN) search.

Here are the key advantages of HNSW

Excellent performance: HNSW outperforms other open-source algorithms on a wide variety of datasets, particularly in the case of high-dimensional data. Even on datasets where NSW struggled, HNSW was able to achieve the best results.

Robustness: HNSW is applicable in generalised metric spaces and performs well on any dataset tested in the paper. This eliminates the need to select a specific algorithm for a particular problem, making HNSW attractive for practical applications. The algorithm's robustness is particularly important when dealing with data that has a complex structure with varying effective dimensionality across different scales.

Continuous incremental indexing: HNSW supports continuous incremental indexing, allowing for efficient updates to the index as new data points are added.

Approximation of k-NN and relative neighbourhood graphs: As a by-product of the index construction, HNSW can efficiently approximate k-NN and relative neighbourhood graphs, which can be useful for various applications.

Logarithmic scalability: Due to its hierarchical structure and the use of navigable small world graphs, HNSW achieves logarithmic scalability in search complexity, making it more efficient than NSW, which has polylogarithmic complexity.

Limitations to HNSW compared to NSW

Loss of distributed search capability

The search in HNSW always starts from the top layer, which makes it difficult to distribute the structure using the same techniques as in NSW.

This is due to the potential congestion of the higher layer elements. While simple workarounds like data partitioning can be used to distribute the structure, the total parallel throughput of the system may not scale well with the number of computer nodes.

Despite this limitation, there are potential ways to make HNSW distributed, similar to the techniques used in skip lists. This could lead to even better distributed performance compared to NSW due to HNSW's logarithmic scalability and ideally uniform load on the nodes.

Implementing HNSW with FAISS

FAISS, a library developed by Facebook AI, provides a practical framework for implementing HNSW.

It simplifies the complex process of setting up an HNSW index, allowing for straightforward customisation of key parameters such as the number of neighbours MMM, the construction efficiency (efConstruction), and the search efficiency (efsearch).

Through Faiss, users can readily experiment with these settings to find the optimal configuration for their specific needs.

The Impact of Parameters on Performance

Implementing HNSW with FAISS FAISS is a library for efficient similarity search and clustering of dense vectors, developed by Facebook AI Research. It provides an implementation of HNSW that can be easily integrated into Python projects.

Here's a step-by-step guide on how to set up an HNSW index using FAISS:

Install FAISS

pip install faiss-cpu

Import the necessary modules

import numpy as np
import faiss

Prepare your data

# d: dimensionality of vectors
# nb: number of vectors in the database
# nq: number of query vectors
d = 128
nb = 100000
nq = 10000
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')

Create an HNSW index

# M: maximum number of outgoing edges per node
# ef_construction: size of the dynamic candidate list during construction
M = 16
ef_construction = 200
index = faiss.IndexHNSWFlat(d, M)
index.hnsw.efConstruction = ef_construction

Add vectors to the index

index.add(xb)

Search for nearest neighbours

# k: number of nearest neighbors to retrieve
# ef_search: size of the dynamic candidate list during searching
k = 10
ef_search = 100
index.hnsw.efSearch = ef_search
distances, indices = index.search(xq, k)

The key parameters to tune are:

  • M: A larger M leads to a more connected graph, which can improve recall but also increases memory usage.

  • ef_construction: A larger ef_construction leads to a more thorough search for nearest neighbours during construction, which can improve the quality of the graph but also increases construction time.

  • ef_search: A larger ef_search leads to a more thorough search for nearest neighbours during querying, which can improve recall but also increases search time.

Enhancing HNSW's Performance

Product Quantization (PQ) is a technique that compresses high-dimensional vectors into compact codes by dividing each vector into mmm subvectors and quantizing each subvector independently using a small codebook.

This compression allows for reduced memory usage but introduces some approximation error, which can lead to a small loss in recall.

The trade-off between memory usage and accuracy can be controlled by adjusting the parameters mmm (number of subvectors) and bits (number of bits per subvector).

Increasing mmm and bits leads to more accurate quantization but also increases memory usage.

Conversely, decreasing m and bits reduces memory usage but may result in a larger loss in recall.

For example, suppose we have a dataset of 1 million 128-dimensional vectors.

Using PQ with m=8m=8 m=8and bits=8bits=8bits=8, each vector is compressed from 128 Ɨ 4 bytes (512 bytes) to 8 bytes, achieving a 64Ɨ reduction in memory usage.

However, this compression may result in a small loss in recall, e.g., from 0.95 to 0.93. By increasing m and bits, we can improve recall at the cost of higher memory usage.

Product Quantization (PQ) and Inverted File System (IVF)

Product Quantization (PQ) is a technique for compressing high-dimensional vectors into compact codes, which can significantly reduce memory usage at the cost of a small loss in accuracy.

In PQ, each vector is divided into m subvectors, and each subvector is quantized independently using a codebook of size 2b2^b2b. The resulting code for each vector is a concatenation of the indices of the nearest codewords for each subvector.

To use PQ with HNSW in FAISS, you can create an IndexHNSWPQ object instead of IndexHNSWFlat:

# m: number of subvectors
# bits: number of bits per subvector
m = 8
bits = 8
index = faiss.IndexHNSWPQ(d, M, m, bits)

Inverted File System (IVF)

An Inverted File System (IVF) is a technique that speeds up the search process in HNSW by partitioning the vector space into Voronoi cells and maintaining an inverted list of vectors for each cell.

During search, only the most relevant cells are visited, reducing the number of distance computations required.

Voronoi Cells in IVF

Each cell encompasses all points closer to its centroid than to any other centroid. These cells facilitate the organisation of vectors into a structured format that allows for efficient querying:

  • Quantization: The process of mapping vectors to the nearest Voronoi centroid reduces the continuous vector space into a discrete set of cells.

  • Efficiency: By allocating vectors to these cells, IVF reduces the search space to the most relevant cells, based on proximity to the query vector's centroid.

  • Impact on Search: The granularity of the partitioning (determined by the number of cells) directly influences search accuracy and speed. More cells mean higher precision but potentially slower searches due to increased index overhead.

Centroid Vector

A centroid vector represents the centre of a Voronoi cell in the context of an Inverted File System (IVF).

It is the average (mean) of all the vectors within a cell and serves as the reference point for that cell. When partitioning the vector space, each vector is associated with the nearest centroid, effectively organising the vectors into distinct, non-overlapping groups.

Key Points:

  • Role in IVF: Serves as the basis for partitioning the vector space and efficiently searching by narrowing down the target search area to the most relevant cells.

  • Determination: Typically determined using clustering algorithms such as k-means, where the centroid is the mean of all vectors assigned to a cell.

  • Functionality: Enables a fast approximation of proximity by comparing a query vector primarily with centroid vectors instead of all vectors in the dataset, thereby speeding up the search process.

Here's how IVF works:

  1. The vector space is partitioned into a set of Voronoi cells using a quantizer (e.g., k-means clustering). Each cell is represented by a centroid vector.

  2. For each vector in the dataset, the nearest centroid is found, and the vector is added to the corresponding inverted list.

  3. During search, the query vector is first compared to the centroids of the Voronoi cells to identify the most relevant cells.

  4. The search is then performed only within the inverted lists of the selected cells, significantly reducing the number of distance computations.

By limiting the search to a small subset of the dataset, IVF can significantly speed up the search process, especially for large datasets.

The number of Voronoi cells (nlist) is a key parameter that affects the trade-off between search speed and accuracy. Increasing nlist leads to smaller cells and more focused searches but also increases the overhead of comparing the query vector to the centroids.

In summary, IVF is an effective technique for speeding up HNSW searches by partitioning the vector space and focusing the search on the most relevant regions, reducing the number of distance computations required.

Using IVF with HNSW in FAISS

To use IVF with HNSW in FAISS, you can create an IndexHNSWSQ object and wrap it with an IndexIVFFlat object:

# nlist: number of Voronoi cells
nlist = 1024
quantizer = faiss.IndexHNSWSQ(d, M)
index = faiss.IndexIVFFlat(quantizer, d, nlist)

Here's an example that demonstrates the effectiveness of using PQ and IVF with HNSW:

import numpy as np
import faiss

d = 128
nb = 1000000
nq = 10000
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')

# HNSW without PQ and IVF
index_hnsw = faiss.IndexHNSWFlat(d, M)
index_hnsw.add(xb)
distances_hnsw, indices_hnsw = index_hnsw.search(xq, k)

# HNSW with PQ
index_hnsw_pq = faiss.IndexHNSWPQ(d, M, m, bits)
index_hnsw_pq.add(xb)
distances_hnsw_pq, indices_hnsw_pq = index_hnsw_pq.search(xq, k)

# HNSW with IVF
quantizer = faiss.IndexHNSWSQ(d, M)
index_ivf_hnsw = faiss.IndexIVFFlat(quantizer, d, nlist)
index_ivf_hnsw.train(xb)
index_ivf_hnsw.add(xb)
distances_ivf_hnsw, indices_ivf_hnsw = index_ivf_hnsw.search(xq, k)

print("HNSW without PQ and IVF:")
print("Recall@10:", faiss.eval_recall_at_k(index_hnsw, xq, xb, k))
print("Memory usage:", index_hnsw.ntotal * d * 4 / 1024 / 1024, "MB")

print("HNSW with PQ:")
print("Recall@10:", faiss.eval_recall_at_k(index_hnsw_pq, xq, xb, k))
print("Memory usage:", index_hnsw_pq.ntotal * (m * bits / 8) / 1024 / 1024, "MB")

print("HNSW with IVF:")
print("Recall@10:", faiss.eval_recall_at_k(index_ivf_hnsw, xq, xb, k))
print("Memory usage:", index_ivf_hnsw.ntotal * d * 4 / 1024 / 1024, "MB")

In this example, we compare the performance of HNSW without PQ and IVF, HNSW with PQ, and HNSW with IVF.

The results show that using PQ can significantly reduce memory usage (by a factor of māˆ—bits/32m * bits / 32māˆ—bits/32) at the cost of a small loss in recall, while using IVF can significantly speed up the search process (by a factor of nlistnlistnlist) without affecting memory usage.

Conclusion

In conclusion, Hierarchical Navigable Small World (HNSW) graphs have emerged as a powerful and efficient solution for vector similarity search, offering significant advantages over traditional approaches.

By combining the concepts of proximity graphs, probabilistic skip lists, and navigable small world graphs, HNSW creates a multi-layer graph structure that enables fast and accurate nearest neighbor search, even in high-dimensional spaces.

The documentation provided an in-depth exploration of HNSW's core concepts, graph construction process, and parameter tuning, along with practical examples of its implementation using the FAISS library.

The benefits of HNSW, such as excellent performance, robustness, and logarithmic scalability, make it an attractive choice for a wide range of applications, including information retrieval, recommendation systems, and machine learning.

Moreover, the documentation discussed advanced techniques like Product Quantization (PQ) and Inverted File System (IVF), which can further enhance HNSW's performance by reducing memory usage and speeding up the search process.

The examples and code snippets provided throughout the documentation serve as valuable resources for practitioners looking to implement HNSW in their projects.

As the field of vector similarity search continues to evolve, HNSW has the potential to play a crucial role in enabling fast and accurate similarity search in large-scale datasets.

Future developments in HNSW may include further optimizations, extensions to handle dynamic datasets, and integration with other machine learning techniques to enable more advanced applications.

In summary, the comprehensive documentation on HNSW provides a solid foundation for understanding and implementing this powerful algorithm in various domains.

By leveraging the insights and techniques presented in this documentation, practitioners can harness the full potential of HNSW to build efficient and effective vector similarity search systems.

LogoEfficient and robust approximate nearest neighbor search using...arXiv.org
HNSW
Welcome to Faiss Documentation — Faiss documentation
FAISS Documentation for your reference
The search process through a NSW graph. Starting at a pre-defined entry point, the algorithm greedily traverses to connected vertices that are nearer to the query vector
Layered graph of HNSW, the top layer is our entry point and contains only the longest links, as we move down the layers, the link lengths become shorter and more numerous.
Page cover image