Vector Similarity Search - HNSW

Introduction to Vector Similarity Search

Vector similarity search is a fundamental problem in computer science, with applications spanning across various domains such as information retrieval, recommendation systems, image and video search, and machine learning.

The goal of vector similarity search is to find the most similar vectors to a given query vector from a large collection of vectors, based on a similarity metric such as Euclidean distance or cosine similarity.

In many real-world applications, the number of vectors in the collection can be in the millions or billions, and the dimensionality of the vectors can be high (hundreds or thousands).

This makes exact similarity search computationally expensive and often infeasible.

Therefore, there is a need for efficient algorithms that can perform approximate nearest neighbor (ANN) search, trading off a small amount of accuracy for significant speedups.

Hierarchical Navigable Small World (HNSW) graphs have emerged as a leading technology in vector similarity searches, acclaimed for unparalleled speed and accuracy.

Approximate Nearest Neighbor (ANN) Algorithms

ANN algorithms are a class of algorithms that aim to find the approximate nearest neighbours of a query vector in a large collection of vectors.

The key idea behind ANN algorithms is to build an index structure that allows for fast searching, while sacrificing some accuracy compared to exact search methods.

There are several categories of ANN algorithms, each with its own strengths and weaknesses:

Tree-based methods: These algorithms build a tree-like index structure to partition the vector space and perform a branch-and-bound search. Examples include KD-trees, ball trees, and VP-trees.

Hashing-based methods: These algorithms use locality-sensitive hashing (LSH) functions to map similar vectors to the same hash buckets, allowing for fast retrieval of candidate neighbours. Examples include LSH Forest and Multi-Probe LSH.

Graph-based methods: These algorithms build a graph structure where each node represents a vector, and edges connect similar vectors. The search is performed by traversing the graph starting from a set of initial nodes. Examples include Hierarchical Navigable Small World (HNSW) and Proximity Graph.

Understanding HNSW's Core

Proximity Graphs

HNSW belongs to the class of proximity graphs, where each node represents a vector, and edges connect similar vectors based on a similarity metric (e.g., Euclidean distance).
The key idea is that nearby vectors in the graph are likely to be the nearest neighbours of a query vector.

Probabilistic Skip Lists

Probabilistic skip lists are a data structure that allows for efficient searching and insertion in a sorted list.
The main idea is to create multiple layers of linked lists, with each layer skipping over some elements in the layer below.
This allows for fast searching by starting at the top layer and moving down to lower layers when necessary.

Navigable Small World Graphs

Navigable small world graphs are a type of graph structure that exhibits both local connectivity and long-range links.
The local connectivity ensures that similar vectors are connected, while the long-range links allow for efficient navigation between distant parts of the graph.
The search is performed by starting from a set of entry nodes and greedily traversing the graph towards the query vector.

Hierarchical NSW (HNSW) combines the ideas of proximity graphs, probabilistic skip lists, and navigable small world graphs to create a multi-layer graph structure that allows for efficient and scalable approximate nearest neighbor search.

To illustrate how they all work together, consider the following analogy:

Imagine you are in a large library, and you want to find a specific book (the query vector). The books are organised on shelves based on their similarity (proximity graph).

To quickly locate the book, you start at the library's entrance and follow signs that direct you to the most relevant section (navigable small world graph). Once you reach the section, you can efficiently search for the book by skipping some shelves and only examining the most promising ones (probabilistic skip list). By combining these strategies, you can find the desired book much faster than searching through every shelf in the library.

Similarly, HNSW combines these concepts to create a multi-layer graph structure that allows for efficient navigation and search in high-dimensional vector spaces.

Graph Construction and Parameter Tuning in HNSW

The graph construction process in HNSW is governed by a probability function that determines the insertion layer for each new vector.

The probability of inserting a vector at layer l is given by:

P(l) = 1 / (m_L ^ l)

where $m_L$ is the level multiplier, a hyperparameter that controls the growth rate of the graph.

The insertion process works as follows:

Randomly select a layer $l$ based on the probability function $v$ .
Find the nearest neighbours of the new vector at layer $l$ using a greedy search.
Create bi-directional edges between the new vector and its nearest neighbours.
Repeat steps 1-3 for all layers up to $l$ .

The graph construction process is affected by several key parameters:

$m_L$ : The level multiplier that controls the growth rate of the graph. A larger $m_L$ leads to a more compact graph with fewer layers, while a smaller $m_L$ leads to a taller graph with more layers.
$M$ : The maximum number of outgoing edges per node. A larger $M$ leads to a more connected graph, which can improve recall but also increases memory usage.
efConstruction: The size of the dynamic candidate list during graph construction. A larger efConstruction leads to a more thorough search for nearest neighbours, which can improve the quality of the graph but also increases construction time.

Graph Construction and Parameter Tuning in HNSW

Let's consider a small example to demonstrate how the probability function affects the distribution of vectors across different layers and how this impacts the graph's performance.

Suppose we have a dataset of 10 vectors and set the level multiplier $(m_L)$ to 2. The probability of a vector being inserted at layer $l$ is given by:

$P(l) = 1 / (2^l)$

Layer 0: $P(0) = 1 / (2^0) = 1$

Layer 1: $P(1) = 1 / (2^1) = 0.5$

Layer 2: $P(2) = 1 / (2^2) = 0.25$

In this case:

all 10 vectors will be inserted at layer 0
5 vectors (on average) will be inserted at layer 1
and 2-3 vectors (on average) will be inserted at layer 2.

This distribution of vectors across layers creates a hierarchical structure that allows for efficient search.

If we increase $m_L$ to 4, the probability distribution changes:

Layer 0: $P(0) = 1 / (4^0) = 1$

Layer 1: $P(1) = 1 / (4^1) = 0.25$

Layer 2: $P(2) = 1 / (4^2) = 0.0625$

With $m_L$ = 4

all 10 vectors will still be inserted at layer 0
but only 2-3 vectors (on average) will be inserted at layer 1
and likely no vectors will be inserted at layer 2.

This results in a flatter graph with fewer layers, which can reduce search time but may also reduce recall.

Hierarchical Navigable Small World (HNSW) benefits

The Hierarchical Navigable Small World (HNSW) algorithm offers several benefits over the basic Navigable Small World (NSW) approach, addressing some of its limitations and improving the state-of-the-art in K-Approximate Nearest Neighbor (K-ANN) search.

Here are the key advantages of HNSW

Excellent performance: HNSW outperforms other open-source algorithms on a wide variety of datasets, particularly in the case of high-dimensional data. Even on datasets where NSW struggled, HNSW was able to achieve the best results.

Robustness: HNSW is applicable in generalised metric spaces and performs well on any dataset tested in the paper. This eliminates the need to select a specific algorithm for a particular problem, making HNSW attractive for practical applications. The algorithm's robustness is particularly important when dealing with data that has a complex structure with varying effective dimensionality across different scales.

Continuous incremental indexing: HNSW supports continuous incremental indexing, allowing for efficient updates to the index as new data points are added.

Approximation of k-NN and relative neighbourhood graphs: As a by-product of the index construction, HNSW can efficiently approximate k-NN and relative neighbourhood graphs, which can be useful for various applications.

Logarithmic scalability: Due to its hierarchical structure and the use of navigable small world graphs, HNSW achieves logarithmic scalability in search complexity, making it more efficient than NSW, which has polylogarithmic complexity.

Limitations to HNSW compared to NSW

Loss of distributed search capability

The search in HNSW always starts from the top layer, which makes it difficult to distribute the structure using the same techniques as in NSW.

This is due to the potential congestion of the higher layer elements. While simple workarounds like data partitioning can be used to distribute the structure, the total parallel throughput of the system may not scale well with the number of computer nodes.

Despite this limitation, there are potential ways to make HNSW distributed, similar to the techniques used in skip lists. This could lead to even better distributed performance compared to NSW due to HNSW's logarithmic scalability and ideally uniform load on the nodes.

Implementing HNSW with FAISS

FAISS, a library developed by Facebook AI, provides a practical framework for implementing HNSW.

It simplifies the complex process of setting up an HNSW index, allowing for straightforward customisation of key parameters such as the number of neighbours $M$ , the construction efficiency (efConstruction), and the search efficiency (efsearch).

Through Faiss, users can readily experiment with these settings to find the optimal configuration for their specific needs.

The Impact of Parameters on Performance

Implementing HNSW with FAISS FAISS is a library for efficient similarity search and clustering of dense vectors, developed by Facebook AI Research. It provides an implementation of HNSW that can be easily integrated into Python projects.

Here's a step-by-step guide on how to set up an HNSW index using FAISS:

Install FAISS

pip install faiss-cpu

Import the necessary modules

import numpy as np
import faiss

Prepare your data

# d: dimensionality of vectors
# nb: number of vectors in the database
# nq: number of query vectors
d = 128
nb = 100000
nq = 10000
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')

Create an HNSW index

# M: maximum number of outgoing edges per node
# ef_construction: size of the dynamic candidate list during construction
M = 16
ef_construction = 200
index = faiss.IndexHNSWFlat(d, M)
index.hnsw.efConstruction = ef_construction

Add vectors to the index

index.add(xb)

Search for nearest neighbours

# k: number of nearest neighbors to retrieve
# ef_search: size of the dynamic candidate list during searching
k = 10
ef_search = 100
index.hnsw.efSearch = ef_search
distances, indices = index.search(xq, k)

The key parameters to tune are:

M: A larger M leads to a more connected graph, which can improve recall but also increases memory usage.
ef_construction: A larger ef_construction leads to a more thorough search for nearest neighbours during construction, which can improve the quality of the graph but also increases construction time.
ef_search: A larger ef_search leads to a more thorough search for nearest neighbours during querying, which can improve recall but also increases search time.

Enhancing HNSW's Performance

Product Quantization (PQ) is a technique that compresses high-dimensional vectors into compact codes by dividing each vector into $m$ subvectors and quantizing each subvector independently using a small codebook.

This compression allows for reduced memory usage but introduces some approximation error, which can lead to a small loss in recall.

The trade-off between memory usage and accuracy can be controlled by adjusting the parameters $m$ (number of subvectors) and bits (number of bits per subvector).

Increasing $m$ and bits leads to more accurate quantization but also increases memory usage.

Conversely, decreasing m and bits reduces memory usage but may result in a larger loss in recall.

For example, suppose we have a dataset of 1 million 128-dimensional vectors.

Using PQ with $m=8$ and $bits=8$ , each vector is compressed from 128 × 4 bytes (512 bytes) to 8 bytes, achieving a 64× reduction in memory usage.

However, this compression may result in a small loss in recall, e.g., from 0.95 to 0.93. By increasing m and bits, we can improve recall at the cost of higher memory usage.

Product Quantization (PQ) and Inverted File System (IVF)

Product Quantization (PQ) is a technique for compressing high-dimensional vectors into compact codes, which can significantly reduce memory usage at the cost of a small loss in accuracy.

In PQ, each vector is divided into m subvectors, and each subvector is quantized independently using a codebook of size $2^b$ . The resulting code for each vector is a concatenation of the indices of the nearest codewords for each subvector.

To use PQ with HNSW in FAISS, you can create an IndexHNSWPQ object instead of IndexHNSWFlat:

# m: number of subvectors
# bits: number of bits per subvector
m = 8
bits = 8
index = faiss.IndexHNSWPQ(d, M, m, bits)

Inverted File System (IVF)

An Inverted File System (IVF) is a technique that speeds up the search process in HNSW by partitioning the vector space into Voronoi cells and maintaining an inverted list of vectors for each cell.

During search, only the most relevant cells are visited, reducing the number of distance computations required.

Voronoi Cells in IVF

Each cell encompasses all points closer to its centroid than to any other centroid. These cells facilitate the organisation of vectors into a structured format that allows for efficient querying:

Quantization: The process of mapping vectors to the nearest Voronoi centroid reduces the continuous vector space into a discrete set of cells.
Efficiency: By allocating vectors to these cells, IVF reduces the search space to the most relevant cells, based on proximity to the query vector's centroid.
Impact on Search: The granularity of the partitioning (determined by the number of cells) directly influences search accuracy and speed. More cells mean higher precision but potentially slower searches due to increased index overhead.

Centroid Vector

A centroid vector represents the centre of a Voronoi cell in the context of an Inverted File System (IVF).

It is the average (mean) of all the vectors within a cell and serves as the reference point for that cell. When partitioning the vector space, each vector is associated with the nearest centroid, effectively organising the vectors into distinct, non-overlapping groups.

Key Points:

Role in IVF: Serves as the basis for partitioning the vector space and efficiently searching by narrowing down the target search area to the most relevant cells.
Determination: Typically determined using clustering algorithms such as k-means, where the centroid is the mean of all vectors assigned to a cell.
Functionality: Enables a fast approximation of proximity by comparing a query vector primarily with centroid vectors instead of all vectors in the dataset, thereby speeding up the search process.

Here's how IVF works:

The vector space is partitioned into a set of Voronoi cells using a quantizer (e.g., k-means clustering). Each cell is represented by a centroid vector.
For each vector in the dataset, the nearest centroid is found, and the vector is added to the corresponding inverted list.
During search, the query vector is first compared to the centroids of the Voronoi cells to identify the most relevant cells.
The search is then performed only within the inverted lists of the selected cells, significantly reducing the number of distance computations.

By limiting the search to a small subset of the dataset, IVF can significantly speed up the search process, especially for large datasets.

The number of Voronoi cells (nlist) is a key parameter that affects the trade-off between search speed and accuracy. Increasing nlist leads to smaller cells and more focused searches but also increases the overhead of comparing the query vector to the centroids.

In summary, IVF is an effective technique for speeding up HNSW searches by partitioning the vector space and focusing the search on the most relevant regions, reducing the number of distance computations required.

Using IVF with HNSW in FAISS

To use IVF with HNSW in FAISS, you can create an IndexHNSWSQ object and wrap it with an IndexIVFFlat object:

# nlist: number of Voronoi cells
nlist = 1024
quantizer = faiss.IndexHNSWSQ(d, M)
index = faiss.IndexIVFFlat(quantizer, d, nlist)

Here's an example that demonstrates the effectiveness of using PQ and IVF with HNSW:

import numpy as np
import faiss

d = 128
nb = 1000000
nq = 10000
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')

# HNSW without PQ and IVF
index_hnsw = faiss.IndexHNSWFlat(d, M)
index_hnsw.add(xb)
distances_hnsw, indices_hnsw = index_hnsw.search(xq, k)

# HNSW with PQ
index_hnsw_pq = faiss.IndexHNSWPQ(d, M, m, bits)
index_hnsw_pq.add(xb)
distances_hnsw_pq, indices_hnsw_pq = index_hnsw_pq.search(xq, k)

# HNSW with IVF
quantizer = faiss.IndexHNSWSQ(d, M)
index_ivf_hnsw = faiss.IndexIVFFlat(quantizer, d, nlist)
index_ivf_hnsw.train(xb)
index_ivf_hnsw.add(xb)
distances_ivf_hnsw, indices_ivf_hnsw = index_ivf_hnsw.search(xq, k)

print("HNSW without PQ and IVF:")
print("Recall@10:", faiss.eval_recall_at_k(index_hnsw, xq, xb, k))
print("Memory usage:", index_hnsw.ntotal * d * 4 / 1024 / 1024, "MB")

print("HNSW with PQ:")
print("Recall@10:", faiss.eval_recall_at_k(index_hnsw_pq, xq, xb, k))
print("Memory usage:", index_hnsw_pq.ntotal * (m * bits / 8) / 1024 / 1024, "MB")

print("HNSW with IVF:")
print("Recall@10:", faiss.eval_recall_at_k(index_ivf_hnsw, xq, xb, k))
print("Memory usage:", index_ivf_hnsw.ntotal * d * 4 / 1024 / 1024, "MB")

In this example, we compare the performance of HNSW without PQ and IVF, HNSW with PQ, and HNSW with IVF.

The results show that using PQ can significantly reduce memory usage (by a factor of $m * bits / 32$ ) at the cost of a small loss in recall, while using IVF can significantly speed up the search process (by a factor of $nlist$ ) without affecting memory usage.

Conclusion

In conclusion, Hierarchical Navigable Small World (HNSW) graphs have emerged as a powerful and efficient solution for vector similarity search, offering significant advantages over traditional approaches.

By combining the concepts of proximity graphs, probabilistic skip lists, and navigable small world graphs, HNSW creates a multi-layer graph structure that enables fast and accurate nearest neighbor search, even in high-dimensional spaces.

The documentation provided an in-depth exploration of HNSW's core concepts, graph construction process, and parameter tuning, along with practical examples of its implementation using the FAISS library.

The benefits of HNSW, such as excellent performance, robustness, and logarithmic scalability, make it an attractive choice for a wide range of applications, including information retrieval, recommendation systems, and machine learning.

Moreover, the documentation discussed advanced techniques like Product Quantization (PQ) and Inverted File System (IVF), which can further enhance HNSW's performance by reducing memory usage and speeding up the search process.

The examples and code snippets provided throughout the documentation serve as valuable resources for practitioners looking to implement HNSW in their projects.

As the field of vector similarity search continues to evolve, HNSW has the potential to play a crucial role in enabling fast and accurate similarity search in large-scale datasets.

Future developments in HNSW may include further optimizations, extensions to handle dynamic datasets, and integration with other machine learning techniques to enable more advanced applications.

In summary, the comprehensive documentation on HNSW provides a solid foundation for understanding and implementing this powerful algorithm in various domains.

By leveraging the insights and techniques presented in this documentation, practitioners can harness the full potential of HNSW to build efficient and effective vector similarity search systems.

PreviousPrincipal Component Analysis (PCA)NextFAISS (Facebook AI Similarity Search)

Last updated 1 year ago

Was this helpful?