LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • FAISS is not involved in the embedding process
  • Approximate Nearest Neighbor Search (ANN)
  • Non-Exhaustive Search
  • FAISS is a library of tools
  • Technical details
  • Application of FAISS
  • Conclusion

Was this helpful?

  1. KNOWLEDGE
  2. Vector Databases

FAISS (Facebook AI Similarity Search)

PreviousVector Similarity Search - HNSWNextUnsupervised Dense Retrievers

Last updated 11 months ago

Was this helpful?

FAISS (Facebook AI Similarity Search) is a library developed by Facebook AI Research for efficient similarity search and clustering of dense vectors.

It is useful for large-scale similarity search problems, which are common in various machine learning and information retrieval tasks.

FAISS is designed to work on either the GPU or CPU and provides significant performance improvements compared to other nearest neighbour search algorithms.

The research foundations for FAISS are extensive:

FAISS Research Foundations

FAISS (Facebook AI Similarity Search) is a library developed by Facebook AI that enables efficient similarity search and clustering of dense vectors.

  1. Inverted File: Originating from Sivic and Zisserman's work in 2003, the inverted file is crucial for non-exhaustive search in large datasets, enabling searches without scanning all elements in the index.

  2. Product Quantization (PQ): Introduced by Jégou et al. in 2011, PQ is a method for lossy compression of high-dimensional vectors that supports relatively accurate reconstructions and distance computations in the compressed domain.

  3. Three-Level Quantization (IVFADC-R): From Tavenard et al.'s 2011 research, this method further refines the quantization process for more efficient searches.

  4. Inverted Multi-Index: Babenko and Lempitsky's 2012 work improves the speed of inverted indexing, enhancing search efficiency.

  5. Optimized PQ: He et al.'s 2013 research optimizes product quantization, adapting the vector space for more effective indexing.

  6. Pre-Filtering of PQ Distances: Introduced by Douze et al. in 2016, this technique adds a binary filtering stage before computing PQ distances, improving search speed.

  7. GPU Implementation and Fast k-Selection: Johnson et al.'s 2017 paper details the adaptation of FAISS for GPU, enabling faster search processes.

  8. HNSW Indexing Method: Malkov et al.'s 2016 work on Hierarchical Navigable Small World graphs contributes to an efficient and robust approximate nearest neighbor search method.

  9. In-Register Vector Comparisons: André et al. (2019) and Guo et al. (2020) explore SIMD optimizations to enhance product quantization's efficiency.

  10. Binary Multi-Index Hashing: Norouzi et al.'s 2012 research introduces a method to expedite searches in Hamming space.

  11. Graph-Based Indexing (NSG): Fu et al.'s 2019 research on the Navigating Spreading-out Graph method aids in fast approximate nearest neighbor searches.

  12. Local Search Quantization: Research by Julieta Martinez et al. in 2016 and 2018 introduces methods to improve quantization for higher recall in searches.

  13. Residual Quantizer: Liu et al.'s 2015 paper on residual vector quantization enhances the accuracy of approximate nearest neighbor searches.

  14. A Survey of Product Quantization: A general paper by Matsui et al. in 2018, providing an overview of product quantization and related methods.

These research foundations collectively contribute to FAISS's capabilities in efficient similarity search and clustering, making it a powerful tool for handling large-scale, high-dimensional data.

Vector databases have become the standard for managing vast collections of embedding vectors.

These embeddings, which are dense vector representations of data items like words, images, or user preferences, are typically generated by neural networks.

Embeddings offer a compact yet expressive data representation, enabling efficient similarity searches and a plethora of other operations.

They transform data into a vector space where the proximity of vectors corresponds to the similarity of the data they represent, thereby facilitating operations like similarity search.

Like any database, their are scalability issues - speed, storage costs and accuracy are key considerations.

FAISS offers a suite of indexing methods and tools that

1) enable efficient searching and clustering of vectors

2) provide capabilities for compressing and transforming vectors

Central to the operation of FAISS is the "embedding contract," a conceptual framework where the embedding extractor—typically a neural network—is trained to generate vectors such that the distances between them mirror the similarity of the corresponding data items.

Concurrently, the vector index within FAISS is optimised to perform neighbour searches with accuracy, adhering to the established distance metrics.

This dual commitment ensures that FAISS remains a reliable and effective tool for vector database management, addressing the critical needs of the AI domain with precision and efficiency.

FAISS is not involved in the embedding process

FAISS doesn't have the capability to take raw data and convert it into embeddings. FAISS is designed to take embeddings and perform operations like indexing, searching, clustering, and transforming these vectors efficiently.

Its primary role is to deal with the embeddings after they've been created, focusing on how to store, search, and manage them effectively, especially when dealing with large-scale datasets.

Approximate Nearest Neighbor Search (ANN)

ANN operates on the principle that slightly imperfect results can be acceptable if they come with significant gains in efficiency or resource usage.

This trade-off allows for the exploration of new design spaces in solution architecture.

Instead of storing data as a plain matrix, ANN introduces an indexing structure to preprocess the database. This indexing facilitates more efficient querying by organising the data in a manner that speeds up the search for nearest neighbours.

Non-Exhaustive Search

Non-exhaustive search in fast search implementations for medium-sized datasets focuses on quickly identifying a subset of database vectors most likely to contain the search results.

Faiss implements two non-exhaustive search methods:

Inverted files - clusters database vectors and store them in a way that only a subset of clusters is examined during a search.

Graph-based indexing - builds a directed graph of vectors, exploring edges closest to the query vector during the search.

These methods aim to reduce the complexity of searches, making them faster than exhaustive searches, especially as the dataset size grows.

However, their effectiveness can vary with the dataset's dimensionality and size.

The choice between inverted file and graph-based indexing depends on the trade-off between memory usage and search speed, with graph-based methods generally being more memory-intensive but potentially faster for smaller datasets.

FAISS is a library of tools

FAISS offers a comprehensive set of tools for vector similarity search.

It integrates various indexing methods, which often require a sequence of components like preprocessing, compression, and non-exhaustive search.

The diversity in tools accommodates different efficiency needs based on specific usage constraints, meaning the most effective indexing method can vary depending on the scenario.

Technical details

FAISS employs several techniques to achieve efficient similarity search:

Quantization

FAISS uses quantization techniques to compress the embeddings, which significantly reduces memory usage and accelerates distance computations. One of the most widely used quantization techniques in FAISS is Product Quantization (PQ).

PQ approximates each vector by the concatenation of multiple codebook entries, enabling efficient storage and fast distance computation.

Indexing

FAISS provides multiple index types to cater to different use cases and trade-offs between search speed and search quality. Some common index types are:

1) Flat index

A brute-force index that computes exact distances between query vectors and indexed vectors.

2) IVF (Inverted File)

A partitioned index that divides the vector space into Voronoi cells. It stores the centroids of these cells and assigns each vector to the nearest centroid. This reduces the number of distance computations required for each query.

3) HNSW (Hierarchical Navigable Small World)

A graph-based index that builds a hierarchical graph structure, enabling efficient nearest neighbor search with logarithmic complexity.

GPU acceleration

FAISS leverages the parallelism of GPUs to accelerate similarity search operations, making it suitable for large-scale and real-time applications.

Application of FAISS

Here are some highlighted use cases demonstrating FAISS's broad applicability:

Trillion-scale Index

FAISS can handle massive datasets, as illustrated by an example where it indexed 1.5 trillion vectors of 144 dimensions.

The process involved a hierarchical, distributed approach using PCA for dimensionality reduction, scalar quantization, and HNSW for coarse quantization, followed by sharding techniques for efficient storage and search, demonstrating FAISS's capability to manage and search through enormous data volumes efficiently.

Text Retrieval

FAISS can facilitate information retrieval tasks like fact-checking, entity linking, and question answering.

Embedding models optimised for text retrieval are used to search across large text corpora, aiding in extracting relevant information or documents quickly.

Data Mining

FAISS aids in the mining and organisation of large datasets, such as finding bilingual texts in vast web-crawled datasets or organising a language model's training corpus.

For instance, it can group similar documents or identify duplicate images in a dataset, optimising the data curation process for better training model performance or more efficient storage.

Conclusion

FAISS is a well developed and widely used library. It underpins many of the modern vector database offerings such as Pinecone and Zilliz.

LogoThe Faiss libraryarXiv.org
The FAISS Library paper
Page cover image