# Retrieval Augmented Generation

<mark style="color:blue;">**Retrieval-Augmented Generation (RAG)**</mark> is an approach that combines the capabilities of retrieval-based models and generative models to improve the performance of large-scale language models (LLMs).&#x20;

RAG operates by enhancing a language model's knowledge base not through direct training on new data but by *<mark style="color:yellow;">**accessing external databases or the internet in real-time.**</mark>*&#x20;

The process involves transforming a query into an embedding, which is then matched with relevant context from a vector database. The language model, armed with this context, generates responses that are both informed and tailored to the query's specifics.

{% embed url="<https://arxiv.org/abs/2005.11401>" %}
The highly cited seminal paper on RAG
{% endembed %}

This paper discusses the challenges with current neural language models, such as their inability to easily update their knowledge or explain how they came up with their answers.&#x20;

RAG models aim to address these issues by making it possible to directly revise and expand the knowledge they use and inspect how they generate responses.

The paper shows that RAG models can outperform other models in tasks that require a deep understanding of the world, like answering open-domain questions, by generating responses that are not only correct but also rich in detail and variety.

In simple terms, this paper is about making AI smarter by allowing it to read up on topics before responding, resulting in more accurate and detailed answers.

### <mark style="color:purple;">Here are the top 5 benefits of RAG</mark>

<mark style="color:blue;">**Improved knowledge retrieval**</mark>

RAG combines the power of knowledge retrieval with the generative capabilities of LLMs, allowing them to retrieve more accurate and relevant information from a vast knowledge base before generating a response.

<mark style="color:blue;">**Scalability**</mark>

RAG leverages the efficient indexing and retrieval capabilities of vector databases, enabling the neural language model to scale up to large knowledge sources without sacrificing performance.

<mark style="color:blue;">**Enhanced context understanding**</mark>

RAG allows LLMs to leverage context from both the input query and retrieved documents, which improves their understanding and enables them to generate more coherent and relevant responses.

<mark style="color:blue;">**Few-shot learning**</mark>

RAG-based models can demonstrate better few-shot learning capabilities than traditional LLMs because they can retrieve relevant information from the knowledge base even when limited training data is available.

<mark style="color:blue;">**Customisable knowledge sources**</mark>

RAG allows users to incorporate domain-specific or task-specific knowledge sources into the LLM, making it highly adaptable to various applications.

<mark style="color:blue;">**Vector databases**</mark> play a **crucial role in RAG** by enabling efficient retrieval of relevant knowledge. They store embeddings of knowledge base documents, which are used to find the most relevant information based on the query embedding.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://training.continuumlabs.ai/knowledge/retrieval-augmented-generation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
