LogoLogo
Continuum WebsiteContinuum ApplicationsContinuum KnowledgeAxolotl Platform
Continuum Knowledge
Continuum Knowledge
  • Continuum
  • Data
    • Datasets
      • Pre Training Data
      • Types of Fine Tuning
      • Self Instruct Paper
      • Self-Alignment with Instruction Backtranslation
      • Systematic Evaluation of Instruction-Tuned Large Language Models on Open Datasets
      • Instruction Tuning
      • Instruction Fine Tuning - Alpagasus
      • Less is More For Alignment
      • Enhanced Supervised Fine Tuning
      • Visualising Data using t-SNE
      • UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
      • Training and Evaluation Datasets
      • What is perplexity?
  • MODELS
    • Foundation Models
      • The leaderboard
      • Foundation Models
      • LLama 2 - Analysis
      • Analysis of Llama 3
      • Llama 3.1 series
      • Google Gemini 1.5
      • Platypus: Quick, Cheap, and Powerful Refinement of LLMs
      • Mixtral of Experts
      • Mixture-of-Agents (MoA)
      • Phi 1.5
        • Refining the Art of AI Training: A Deep Dive into Phi 1.5's Innovative Approach
      • Phi 2.0
      • Phi-3 Technical Report
  • Training
    • The Fine Tuning Process
      • Why fine tune?
        • Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
        • Explanations in Fine Tuning
      • Tokenization
        • Tokenization Is More Than Compression
        • Tokenization - SentencePiece
        • Tokenization explore
        • Tokenizer Choice For LLM Training: Negligible or Crucial?
        • Getting the most out of your tokenizer for pre-training and domain adaptation
        • TokenMonster
      • Parameter Efficient Fine Tuning
        • P-Tuning
          • The Power of Scale for Parameter-Efficient Prompt Tuning
        • Prefix-Tuning: Optimizing Continuous Prompts for Generation
        • Harnessing the Power of PEFT: A Smarter Approach to Fine-tuning Pre-trained Models
        • What is Low-Rank Adaptation (LoRA) - explained by the inventor
        • Low Rank Adaptation (Lora)
        • Practical Tips for Fine-tuning LMs Using LoRA (Low-Rank Adaptation)
        • QLORA: Efficient Finetuning of Quantized LLMs
        • Bits and Bytes
        • The Magic behind Qlora
        • Practical Guide to LoRA: Tips and Tricks for Effective Model Adaptation
        • The quantization constant
        • QLORA: Efficient Finetuning of Quantized Language Models
        • QLORA and Fine-Tuning of Quantized Language Models (LMs)
        • ReLoRA: High-Rank Training Through Low-Rank Updates
        • SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
        • GaLora: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • Hyperparameters
        • Batch Size
        • Padding Tokens
        • Mixed precision training
        • FP8 Formats for Deep Learning
        • Floating Point Numbers
        • Batch Size and Model loss
        • Batch Normalisation
        • Rethinking Learning Rate Tuning in the Era of Language Models
        • Sample Packing
        • Gradient accumulation
        • A process for choosing the learning rate
        • Learning Rate Scheduler
        • Checkpoints
        • A Survey on Efficient Training of Transformers
        • Sequence Length Warmup
        • Understanding Training vs. Evaluation Data Splits
        • Cross-entropy loss
        • Weight Decay
        • Optimiser
        • Caching
      • Training Processes
        • Extending the context window
        • PyTorch Fully Sharded Data Parallel (FSDP)
        • Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
        • YaRN: Efficient Context Window Extension of Large Language Models
        • Sliding Window Attention
        • LongRoPE
        • Reinforcement Learning
        • An introduction to reinforcement learning
        • Reinforcement Learning from Human Feedback (RLHF)
        • Direct Preference Optimization: Your Language Model is Secretly a Reward Model
  • INFERENCE
    • Why is inference important?
      • Grouped Query Attention
      • Key Value Cache
      • Flash Attention
      • Flash Attention 2
      • StreamingLLM
      • Paged Attention and vLLM
      • TensorRT-LLM
      • Torchscript
      • NVIDIA L40S GPU
      • Triton Inference Server - Introduction
      • Triton Inference Server
      • FiDO: Fusion-in-Decoder optimised for stronger performance and faster inference
      • Is PUE a useful measure of data centre performance?
      • SLORA
  • KNOWLEDGE
    • Vector Databases
      • A Comprehensive Survey on Vector Databases
      • Vector database management systems: Fundamental concepts, use-cases, and current challenges
      • Using the Output Embedding to Improve Language Models
      • Decoding Sentence-BERT
      • ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
      • SimCSE: Simple Contrastive Learning of Sentence Embeddings
      • Questions Are All You Need to Train a Dense Passage Retriever
      • Improving Text Embeddings with Large Language Models
      • Massive Text Embedding Benchmark
      • RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
      • LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
      • Embedding and Fine-Tuning in Neural Language Models
      • Embedding Model Construction
      • Demystifying Embedding Spaces using Large Language Models
      • Fine-Tuning Llama for Multi-Stage Text Retrieval
      • Large Language Model Based Text Augmentation Enhanced Personality Detection Model
      • One Embedder, Any Task: Instruction-Finetuned Text Embeddings
      • Vector Databases are not the only solution
      • Knowledge Graphs
        • Harnessing Knowledge Graphs to Elevate AI: A Technical Exploration
        • Unifying Large Language Models and Knowledge Graphs: A Roadmap
      • Approximate Nearest Neighbor (ANN)
      • High Dimensional Data
      • Principal Component Analysis (PCA)
      • Vector Similarity Search - HNSW
      • FAISS (Facebook AI Similarity Search)
      • Unsupervised Dense Retrievers
    • Retrieval Augmented Generation
      • Retrieval-Augmented Generation for Large Language Models: A Survey
      • Fine-Tuning or Retrieval?
      • Revolutionising Information Retrieval: The Power of RAG in Language Models
      • A Survey on Retrieval-Augmented Text Generation
      • REALM: Retrieval-Augmented Language Model Pre-Training
      • Retrieve Anything To Augment Large Language Models
      • Generate Rather Than Retrieve: Large Language Models Are Strong Context Generators
      • Active Retrieval Augmented Generation
      • DSPy: LM Assertions: Enhancing Language Model Pipelines with Computational Constraints
      • DSPy: Compiling Declarative Language Model Calls
      • DSPy: In-Context Learning for Extreme Multi-Label Classification
      • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
      • HYDE: Revolutionising Search with Hypothetical Document Embeddings
      • Enhancing Recommender Systems with Large Language Model Reasoning Graphs
      • Retrieval Augmented Generation (RAG) versus fine tuning
      • RAFT: Adapting Language Model to Domain Specific RAG
      • Summarisation Methods and RAG
      • Lessons Learned on LLM RAG Solutions
      • Stanford: Retrieval Augmented Language Models
      • Overview of RAG Approaches with Vector Databases
      • Mastering Chunking in Retrieval-Augmented Generation (RAG) Systems
    • Semantic Routing
    • Resource Description Framework (RDF)
  • AGENTS
    • What is agency?
      • Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
      • Types of Agents
      • The risk of AI agency
      • Understanding Personality in Large Language Models: A New Frontier in AI Psychology
      • AI Agents - Reasoning, Planning, and Tool Calling
      • Personality and Brand
      • Agent Interaction via APIs
      • Bridging Minds and Machines: The Legacy of Newell, Shaw, and Simon
      • A Survey on Language Model based Autonomous Agents
      • Large Language Models as Agents
      • AI Reasoning: A Deep Dive into Chain-of-Thought Prompting
      • Enhancing AI Reasoning with Self-Taught Reasoner (STaR)
      • Exploring the Frontier of AI: The "Tree of Thoughts" Framework
      • Toolformer: Revolutionising Language Models with API Integration - An Analysis
      • TaskMatrix.AI: Bridging Foundational AI Models with Specialised Systems for Enhanced Task Completion
      • Unleashing the Power of LLMs in API Integration: The Rise of Gorilla
      • Andrew Ng's presentation on AI agents
      • Making AI accessible with Andrej Karpathy and Stephanie Zhan
  • Regulation and Ethics
    • Regulation and Ethics
      • Privacy
      • Detecting AI Generated content
      • Navigating the IP Maze in AI: The Convergence of Blockchain, Web 3.0, and LLMs
      • Adverse Reactions to generative AI
      • Navigating the Ethical Minefield: The Challenge of Security in Large Language Models
      • Navigating the Uncharted Waters: The Risks of Autonomous AI in Military Decision-Making
  • DISRUPTION
    • Data Architecture
      • What is a data pipeline?
      • What is Reverse ETL?
      • Unstructured Data and Generatve AI
      • Resource Description Framework (RDF)
      • Integrating generative AI with the Semantic Web
    • Search
      • BM25 - Search Engine Ranking Function
      • BERT as a reranking engine
      • BERT and Google
      • Generative Engine Optimisation (GEO)
      • Billion-scale similarity search with GPUs
      • FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
      • Neural Collaborative Filtering
      • Federated Neural Collaborative Filtering
      • Latent Space versus Embedding Space
      • Improving Text Embeddings with Large Language Models
    • Recommendation Engines
      • On Interpretation and Measurement of Soft Attributes for Recommendation
      • A Survey on Large Language Models for Recommendation
      • Model driven recommendation systems
      • Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations
      • Foundation Models for Recommender Systems
      • Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review
      • AI driven recommendations - harming autonomy?
    • Logging
      • A Taxonomy of Anomalies in Log Data
      • Deeplog
      • LogBERT: Log Anomaly Detection via BERT
      • Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection
      • Log-based Anomaly Detection with Deep Learning: How Far Are We?
      • Deep Learning for Anomaly Detection in Log Data: A Survey
      • LogGPT
      • Adaptive Semantic Gate Networks (ASGNet) for log-based anomaly diagnosis
  • Infrastructure
    • The modern data centre
      • Enhancing Data Centre Efficiency: Strategies to Improve PUE
      • TCO of NVIDIA GPUs and falling barriers to entry
      • Maximising GPU Utilisation with Kubernetes and NVIDIA GPU Operator
      • Data Centres
      • Liquid Cooling
    • Servers and Chips
      • The NVIDIA H100 GPU
      • NVIDIA H100 NVL
      • Lambda Hyperplane 8-H100
      • NVIDIA DGX Servers
      • NVIDIA DGX-2
      • NVIDIA DGX H-100 System
      • NVLink Switch
      • Tensor Cores
      • NVIDIA Grace Hopper Superchip
      • NVIDIA Grace CPU Superchip
      • NVIDIA GB200 NVL72
      • Hopper versus Blackwell
      • HGX: High-Performance GPU Platforms
      • ARM Chips
      • ARM versus x86
      • RISC versus CISC
      • Introduction to RISC-V
    • Networking and Connectivity
      • Infiniband versus Ethernet
      • NVIDIA Quantum InfiniBand
      • PCIe (Peripheral Component Interconnect Express)
      • NVIDIA ConnectX InfiniBand adapters
      • NVMe (Non-Volatile Memory Express)
      • NVMe over Fabrics (NVMe-oF)
      • NVIDIA Spectrum-X
      • NVIDIA GPUDirect
      • Evaluating Modern GPU Interconnect
      • Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
      • Next-generation networking in AI environments
      • NVIDIA Collective Communications Library (NCCL)
    • Data and Memory
      • NVIDIA BlueField Data Processing Units (DPUs)
      • Remote Direct Memory Access (RDMA)
      • High Bandwidth Memory (HBM3)
      • Flash Memory
      • Model Requirements
      • Calculating GPU memory for serving LLMs
      • Transformer training costs
      • GPU Performance Optimisation
    • Libraries and Complements
      • NVIDIA Base Command
      • NVIDIA AI Enterprise
      • CUDA - NVIDIA GTC 2024 presentation
      • RAPIDs
      • RAFT
    • Vast Data Platform
      • Vast Datastore
      • Vast Database
      • Vast Data Engine
      • DASE (Disaggregated and Shared Everything)
      • Dremio and VAST Data
    • Storage
      • WEKA: A High-Performance Storage Solution for AI Workloads
      • Introduction to NVIDIA GPUDirect Storage (GDS)
        • GDS cuFile API
      • NVIDIA Magnum IO GPUDirect Storage (GDS)
      • Vectors in Memory
Powered by GitBook
LogoLogo

Continuum - Accelerated Artificial Intelligence

  • Continuum Website
  • Axolotl Platform

Copyright Continuum Labs - 2023

On this page
  • Introduction to "Retrieval Augmented Fine Tuning (RAFT)
  • Key Points
  • Detailed Explanation of RAFT Methodology
  • Introduction to RAFT
  • Supervised Fine-Tuning (SFT) - The Traditional Approach
  • RAFT Methodology
  • Key Concepts and Technical Details
  • 5. Expected Outcomes
  • RAFT Evaluation and Results Summary
  • Evaluation Methodology
  • Key Results
  • Table of Results
  • Key Takeaways
  • RAFT Generalization to Top-K RAG
  • Key Concepts
  • The Challenge
  • RAFT's Approach
  • Experimental Setup
  • Key Findings
  • Implications
  • RAFT: Conclusion and Practical Applications
  • Practical Applications and Commercial Ideas

Was this helpful?

  1. KNOWLEDGE
  2. Retrieval Augmented Generation

RAFT: Adapting Language Model to Domain Specific RAG

PreviousRetrieval Augmented Generation (RAG) versus fine tuningNextSummarisation Methods and RAG

Last updated 8 months ago

Was this helpful?

Introduction to "Retrieval Augmented Fine Tuning (RAFT)

This June 2024 paper introduces a novel approach called Retrieval Augmented Fine Tuning (RAFT), which aims to improve the performance of Large Language Models (LLMs) in domain-specific retrieval-augmented generation (RAG) tasks.

The authors present RAFT as a training methodology that enhances a model's ability to answer questions in "open-book" in-domain settings.

Key Points

  1. Problem Addressed: The paper tackles the challenge of adapting pre-trained LLMs for specialized domains where accuracy based on a given set of documents is crucial.

  2. Current Limitations: Existing methods like in-context learning through RAG and supervised fine-tuning have limitations. RAG-based methods don't leverage the learning opportunity in fixed domains, while fine-tuning approaches often fail to account for imperfections in the retrieval process.

  3. RAFT Solution: The proposed method combines instruction fine-tuning with retrieval augmented generation. It trains the model to:

    • Incorporate domain knowledge

    • Improve in-domain RAG performance

    • Identify and use relevant documents while ignoring distractors

  4. Training Process: RAFT trains the model to answer questions using relevant documents while also presenting it with distractor documents. This process improves the model's ability to reason and cite relevant information.

  5. Performance: The authors report that RAFT consistently outperforms supervised fine-tuning (with and without RAG) across multiple datasets, including PubMed, HotpotQA, and Gorilla.

  6. Analogy: The authors liken their approach to studying for an open-book exam, where the model learns to recognize relevant and irrelevant retrieved documents.

Detailed Explanation of RAFT Methodology

Introduction to RAFT

RAFT (Retrieval Augmented Fine-Tuning) is presented as a training method for Large Language Models (LLMs) specifically designed for domain-specific "open-book" scenarios.

The authors describe it as a way to prepare LLMs for specialised tasks where the model needs to effectively use external information to answer questions.

Supervised Fine-Tuning (SFT) - The Traditional Approach

Before introducing RAFT, the paper explains the traditional Supervised Fine-Tuning (SFT) approach:

  • Dataset Structure: SFT uses a dataset (D) containing pairs of Questions (Q) and Answers (A).

  • Training Process: The model is trained to improve its ability to answer questions based on knowledge gained during pre-training or the SFT phase.

  • Usage Scenarios:

    1. 0-shot Inference: Q → A (answering without additional context)

    2. RAG Inference: Q + D → A (answering with additional documents provided)

RAFT Methodology

RAFT modifies the traditional SFT approach to better prepare models for domain-specific open-book settings:

Data Preparation

  • Each data point contains:

    • A question (Q)

    • A set of documents (Dk)

    • A Chain-of-Thought style answer (A*)

Document Types

  1. *'Golden' Documents (D)**: Contains information needed to answer the question.

  2. 'Distractor' Documents (Di): Do not contain answer-relevant information.

Training Data Structure

RAFT uses two types of training data:

  1. For P% of questions:

    • Q + D* + D1 + D2 + ... + Dk → A* (Question + Golden Document + Distractor Documents → Answer)

  2. For (1-P)% of questions:

    • Q + D1 + D2 + ... + Dk → A* (Question + Only Distractor Documents → Answer)

Training Process

  • The model is fine-tuned using standard SFT techniques on this prepared data.

  • By sometimes removing golden documents, the model is compelled to memorise answers and learn to distinguish between relevant and irrelevant information.

Chain-of-Thought Reasoning

  • RAFT incorporates Chain-of-Thought reasoning in the answers.

  • This involves creating a full reasoning chain and citing sources from the context.

  • Answers include:

    1. Citations from the original context (marked with ##begin_quote## and ##end_quote##)

    2. Detailed explanations on how to reach the conclusion based on the citations

Key Concepts and Technical Details

  1. Open-book Exam Analogy: RAFT is likened to preparing for an open-book exam, where the model learns to recognise and use relevant information while ignoring distractors.

  2. In-domain RAG: RAFT is designed to improve the model's performance specifically on the set of documents it's trained on, making it suitable for domain-specific applications.

  3. Retriever Independence: RAFT is independent of the specific retrieval method used in the RAG pipeline.

  4. Balancing Memorisation and Derivation: By including both scenarios (with and without golden documents), RAFT aims to balance the model's ability to memorise important information and derive answers from provided context.

  5. Source Citation: The inclusion of direct quotes from the source documents in the answers helps the model learn to identify and use relevant information accurately.

  6. Flexibility in 'Golden' Documents: The method allows for multiple documents to be considered 'golden' for more complex questions (e.g., in the HotpotQA dataset).

5. Expected Outcomes

The authors suggest that this approach:

  • Enhances the model's accuracy in answering questions

  • Improves the model's ability to reason and explain its answers

  • Increases robustness in handling both relevant and irrelevant information

The subsequent sections of the paper are expected to provide experimental results demonstrating these outcomes across various datasets.

RAFT Evaluation and Results Summary

Evaluation Methodology

  1. Datasets Used:

    • Natural Questions (NQ)

    • Trivia QA

    • HotpotQA

    • HuggingFace, Torch Hub, and TensorFlow Hub (from APIBench)

    • PubMed QA

  2. Baseline Models:

    • LlaMA2-7B-chat model with 0-shot prompting

    • LlaMA2-7B-chat model with RAG

    • Domain-Specific Finetuning (DSF) with 0-shot prompting

    • Domain-Specific Finetuning with RAG (DSF + RAG)

  3. Evaluation Metrics: The paper doesn't explicitly state the metrics, but it appears to use accuracy percentages for comparison.

Key Results

  1. Overall Performance:

    • RAFT consistently outperformed all baselines across the datasets.

    • Significant improvements were observed compared to the base Llama-2 model and domain-specific fine-tuning.

  2. Specific Improvements:

    • Hotpot QA: RAFT showed a 35.25% improvement over the base Llama-2 model.

    • Torch Hub: RAFT demonstrated a 76.35% improvement over the base Llama-2 model.

    • HuggingFace: RAFT outperformed DSF by 31.41%.

  3. Performance on PubMed QA:

    • For binary yes/no questions, RAFT didn't show significant gains compared to DSF + RAG.

  4. Comparison with GPT-3.5:

    • RAFT demonstrated significant advantages even when compared to the larger GPT-3.5 model.

  5. Chain-of-Thought (CoT) Impact:

    • Incorporating CoT significantly improved performance:

      • Hotpot QA: 9.66% improvement

      • HuggingFace: 14.93% improvement

  6. Golden Context Ratio Study:

    • The optimal proportion of training data including golden documents varied across datasets (40%, 60%, 100%).

    • Surprisingly, including some training data without golden documents (P = 80%) enhanced model performance on RAG tasks.

Table of Results

Model
PubMed
HotPot
HuggingFace
Torch Hub
TensorFlow

GPT-3.5 + RAG

71.60

41.5

29.08

60.21

65.59

LLaMA2-7B

56.5

0.54

0.22

0

0

LLaMA2-7B + RAG

58.8

0.03

26.43

08.60

43.06

DSF

59.7

6.38

61.06

84.94

86.56

DSF + RAG

71.6

4.41

42.59

82.80

60.29

RAFT(LLaMA2-7B)

73.30

35.28

74.00

84.95

86.86

Key Takeaways

  1. RAFT significantly improves RAG performance across various specialized domains.

  2. The method enhances both the model's ability to extract information and its robustness towards distractors.

  3. Chain-of-Thought reasoning substantially contributes to the model's performance.

  4. Including some training data without golden documents can be beneficial for downstream RAG tasks.

  5. RAFT outperforms larger models like GPT-3.5 in specific domain tasks.

RAFT Generalization to Top-K RAG

This section of the paper explores how RAFT (Retrieval Augmented Fine-Tuning) performs when faced with varying numbers of documents during test time, particularly in top-k RAG (Retrieval-Augmented Generation) scenarios. The researchers aim to address a critical challenge in LLM+RAG systems: the model's ability to handle irrelevant information effectively.

Key Concepts

  1. Top-k RAG: A technique where the k most relevant documents are retrieved and provided to the model during inference.

  2. Distractor Documents: Irrelevant documents included alongside relevant ones during training or testing.

  3. Golden Documents: Highly relevant documents that contain the information needed to answer a query.

The Challenge

Large Language Models (LLMs) are known to be vulnerable to irrelevant text.

This vulnerability becomes particularly problematic in RAG systems, where the retrieval process might introduce irrelevant information. The goal is to make the model robust enough to discern and disregard irrelevant content while focusing on pertinent information.

RAFT's Approach

RAFT addresses this challenge by:

  1. Training the model with a mix of golden (relevant) and distractor (irrelevant) documents.

  2. Investigating the optimal ratio of distractor documents to include during training.

  3. Assessing how well this training approach generalises to different volumes of documents encountered during testing.

Experimental Setup

The researchers conducted two main experiments:

  1. Training with Distractor Documents:

    • Varied the number of distractor documents during training.

    • Consistently evaluated using the top-3 documents from the retriever.

  2. Generalization to Variable Test-Time Documents:

    • Trained models with different numbers of distractor documents.

    • Tested these models with varying numbers of documents at test time.

Key Findings

  1. Importance of Distractor Documents in Training:

    • Training with only golden documents often resulted in inferior performance.

    • Including distractor documents during training improved the model's ability to handle irrelevant information.

  2. Optimal Number of Training Documents:

    • For Natural Questions (NQ): Best performance when training with golden document + 3 distractors (D* + 3D).

    • For HotpotQA: Best performance when training with golden document + 1 distractor (D* + 1D).

    • RAFT consistently used 1 golden document + 4 distractor documents in their experiments.

  3. Generalisation to Variable Test-Time Documents:

    • Models trained with distractor documents showed more resilience to fluctuations in the number of test-time documents.

    • This demonstrates the robustness of the RAFT approach in real-world scenarios where the number of retrieved documents may vary.

Implications

  1. Improved Robustness: RAFT's approach of including distractor documents during training enhances the model's ability to handle irrelevant information in real-world RAG applications.

  2. Flexibility: The method allows for better generalization across different retrieval settings (e.g., top-3, top-5, top-10 RAG).

  3. Optimal Training Strategy: The findings suggest that there's an optimal balance of golden and distractor documents during training, which may vary depending on the specific task or domain.

  4. Real-world Applicability: By demonstrating robustness to varying numbers of test-time documents, RAFT shows promise for deployment in practical RAG systems where retrieval results may be inconsistent.

RAFT: Conclusion and Practical Applications

Retrieval Augmented Fine Tuning (RAFT) represents a significant advancement in training Large Language Models (LLMs) for domain-specific, open-book question answering tasks.

Key aspects of RAFT include:

  1. Training with a mix of relevant (golden) and irrelevant (distractor) documents.

  2. Structuring the dataset to sometimes exclude golden documents from the context.

  3. Generating answers using a chain-of-thought approach with direct quotations from relevant text.

Evaluations on diverse datasets (PubMed, HotpotQA, Gorilla API Bench) demonstrate RAFT's superior performance compared to traditional fine-tuning methods and even larger models like GPT-3.5. RAFT shows particular strength in:

  • Improving information extraction from domain-specific documents.

  • Enhancing robustness against irrelevant information.

  • Generalizing well to varying numbers of retrieved documents during inference.

These capabilities position RAFT as a promising approach for enhancing LLM performance in Retrieval Augmented Generation (RAG) systems across various specialized domains.

Practical Applications and Commercial Ideas

  1. Enhanced Medical Literature Review

    • Application: Assist medical researchers in quickly finding relevant information from vast medical literature.

    • Commercial Idea: Develop a subscription-based platform for healthcare professionals and researchers, offering rapid, accurate insights from medical journals and clinical trial data.

  2. Legal Document Analysis

    • Application: Improve efficiency in legal research and case preparation.

    • Commercial Idea: Create a RAFT-powered legal assistant tool for law firms, offering faster contract analysis, case law research, and legal precedent identification.

  3. Intelligent Technical Support Systems

    • Application: Enhance customer support in technical fields (e.g., software, electronics).

    • Commercial Idea: Develop an AI-powered technical support platform that can accurately answer complex product-specific queries by referencing vast product documentation and user manuals.

  4. Personalised Educational Assistant

    • Application: Provide tailored explanations and answers across various academic subjects.

    • Commercial Idea: Create an adaptive learning platform that uses RAFT to offer personalised tutoring and homework assistance, drawing from textbooks and educational resources.

  5. Financial Analysis and Research Tool

    • Application: Assist in financial research, market analysis, and investment decisions.

    • Commercial Idea: Develop a RAFT-based financial assistant for investment firms and individual investors, offering insights from financial reports, market data, and news articles.

  6. Enhanced Content Management Systems

    • Application: Improve content creation and curation in large organizations.

    • Commercial Idea: Create an intelligent content management system that can answer queries about internal documents, policies, and procedures, aiding in knowledge management and employee onboarding.

  7. Sophisticated Customer Service Chatbots

    • Application: Enhance customer service with more accurate and context-aware responses.

    • Commercial Idea: Offer a RAFT-powered chatbot service that can handle complex customer inquiries by referencing extensive product catalogs, FAQs, and policy documents.

  8. Scientific Literature Assistant

    • Application: Aid researchers in navigating and synthesizing information from scientific papers.

    • Commercial Idea: Develop a research tool for academic institutions and R&D departments that can answer complex scientific questions by analyzing and synthesising information from multiple research papers.

  9. Intelligent Documentation for Software Development

    • Application: Improve code documentation and API understanding for developers.

    • Commercial Idea: Create a RAFT-based coding assistant that can answer queries about complex codebases, APIs, and frameworks by referencing extensive documentation and code repositories.

  10. Regulatory Compliance Assistant

    • Application: Help businesses navigate complex regulatory environments.

    • Commercial Idea: Develop a compliance tool for industries with strict regulations (e.g., finance, healthcare), offering up-to-date guidance on regulatory requirements by analyzing vast amounts of legal and regulatory documents.

These applications leverage RAFT's ability to process domain-specific information accurately and its robustness in handling varying amounts of context, making it valuable across diverse industries and use cases.

LogoRAFT: Adapting Language Model to Domain Specific RAGarXiv.org
RAFT: Adapting Language Model to Domain Specific RAG
How best to prepare for an Exam?(a) Fine-tuning based approaches implement "studying" by either directly "memorising" the input documents or answering practice QAwithout referencing the documents. (b) Alternatively, in-context retrieval methods fail to leverage the learning opportunity afforded by the fixed domain and are equivalent to taking an open-book exam without studying. In contrast, our approach (c) RAFT leverages f ine-tuning with question-answer pairs while referencing the documents in a simulated imperfect retrieval setting — thereby effectively preparing for the open-book exam setting.
Overview of the RAFT method. The top-left figure depicts the approach of adapting LLMs to reading solution from a set of positive and distractor documents in contrast to standard RAG setup where models are trained based on the retriever outputs, which is a mixture of both memorisation and reading. At test time, all methods follow the standard RAG setting, provided with a top-k retrieved documents in the context.
Page cover image