Why fine tune?
There is an ongoing debate around the need for fine tuning a large language model. Many suggest they have found no need for it, that prompt engineering and 'retrieval augmented generation' suffices.
Given fine tuning is difficult and resource intensive, it is not surprising it can be ignored. But it should not be. It is a powerful tool in the kit to augment an AI application.
Generally the debate is fine tuning versus Retrieval Augmented Generation (RAG). This debate is often framed as an either/or choice, but in reality, these techniques serve complementary functions that can be synergistically integrated for optimal results.
In this section of our documentation, we will review the academic work and other sources to make the case for fine tuning and when and how it should be used.
Fine-Tuning
Fine-tuning involves adjusting the weights of a pre-trained language model to improve its performance on specific tasks. This is akin to specialty training in various professions—just as a doctor undergoes specialty training to make precise diagnoses, a language model may be fine-tuned with domain-specific data to enhance its task-specific accuracy. For instance, a model could be fine-tuned on legal texts to better perform tasks related to legal analysis.
Examples of Fine-Tuning:
Language Modeling Task Fine-tuning: Adapts a pre-trained model to improve its general linguistic capabilities or to refine its skills in generating coherent and contextually appropriate text.
Supervised Q&A Fine-tuning: Specifically enhances the model's abilities in question-answering scenarios by training it on a dataset of question and answer pairs.
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation enhances a model's responses by integrating external data into the model's context at inference time. This method can be compared to a doctor consulting a patient's medical history before making a diagnosis. RAG allows a model to access a wide array of information that isn't stored in its parameters but can be crucial for generating accurate and informed outputs.
Examples of RAG:
Using vector databases to pull in relevant information based on the query context.
Incorporating data from APIs or traditional databases to provide real-time, relevant information that the model can use to generate responses.
Combining FT and RAG
Integrating FT and RAG can significantly enhance a model's performance by not only refining its internal understanding and response generation but also by expanding its access to and use of external information.
For example, in a healthcare application, a model might be fine-tuned with medical research to understand and generate medically accurate text while also using RAG to pull patient-specific information from medical records to tailor its responses to individual cases.
Example of Combined Use:
A customer service LLM could be fine-tuned on high-quality customer interaction logs to learn the best communicative practices while using RAG to pull user-specific data to personalise interactions, such as recommending products based on past purchases or addressing past complaints.
Conclusion
The choice between fine-tuning and RAG should not be seen as a binary one; each has its strengths and applications.
Fine-tuning allows for deep customisation of the model's behavior and understanding, making it more adept at specific tasks.
RAG, on the other hand, supplements the model's capabilities by providing additional, context-relevant information at runtime, making it adaptable and resourceful.
When used together, they provide a robust framework for developing powerful, context-aware, and highly specialised LLM applications.
This approach empowers developers to leverage the strengths of both techniques to build more dynamic, responsive, and effective models.
4
Last updated