FOLLOWIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
This March 2024 paper aims to address the lack of instruction-following capabilities in current Information Retrieval (IR) models despite their use of Large Language Models (LLMs) as backbones.
The authors introduce a new dataset and benchmark, FOLLOWIR, to evaluate the instruction-following ability of IR models and provide training data to improve their performance.
Practical Use
The idea behind FOLLOWIR is to enable search systems to understand and follow more detailed, natural language instructions, just like how a human would.
Imagine you're a researcher looking for very specific papers - not just based on a general topic, but also things like the methods used, the publication venue, the time period, etc.
With FOLLOWIR, you could provide all these details as an "instruction" to the search system, and it would try to find the most relevant papers that match all your criteria.
The researchers built this dataset by taking existing TREC collections (which are basically large datasets used for evaluating search systems) and modifying the instructions given to human annotators. This way, they can test if search models can adapt to changes in the instructions and adjust their results accordingly.
They also developed a new evaluation metric called p-MRR that measures how well the rankings of the search results change when the instructions are modified. Using this benchmark, they found that most existing search models don't really follow complex instructions well - they tend to just use them as additional keywords.
To address this, they created a training set of human-written instructions and fine-tuned a language model called FOLLOWIR-7B.
This model showed significant improvements in its ability to follow instructions compared to the base model.
In practice, you could potentially use or build upon FOLLOWIR in a few ways:
If you're developing a search engine or information retrieval system, you could incorporate the FOLLOWIR dataset and model to make your system more flexible and adaptable to detailed user needs.
As a user, a FOLLOWIR-enabled search engine would allow you to express your information needs more naturally and precisely, rather than just relying on keywords. This could be particularly useful for complex research queries, legal searches, patent searches, etc.
The techniques and insights from FOLLOWIR could also potentially be applied to other domains where following instructions is important, such as task-oriented dialogue systems or virtual assistants.
The paper's main conclusion is that current Information Retrieval (IR) models, despite using LLMs, fail to effectively use instructions to determine document relevance.
However, the authors demonstrate that it is possible to improve the instruction-following ability of IR models through fine-tuning on a dataset of diverse, real-world instructions.
The FOLLOWIR benchmark and training data provide valuable resources for the community to develop more capable instruction-following IR models that can adapt to relevance definitions provided in natural language.
In summary, this work addresses an important gap in the current IR landscape and proposes a novel approach to evaluate and improve the instruction-following capabilities of IR models.
The contributions of this paper can facilitate the development of more flexible and adaptable IR systems that can better understand and use complex, natural language instructions to determine document relevance.
Last updated