# Unsupervised Dense Retrievers

Dense retrievers have shown superior results in domains with large training datasets but struggle in zero-shot scenarios or new applications without specific training data, where unsupervised term-frequency methods still excel.&#x20;

This work focuses on exploring <mark style="color:blue;">unsupervised dense retrievers</mark> trained through <mark style="color:blue;">contrastive learning,</mark> demonstrating their potential across various retrieval settings, including multilingual retrieval.

{% embed url="<https://arxiv.org/abs/2112.09118>" %}
Unsupervised dense retrievers trained through contrastive learning
{% endembed %}

### <mark style="color:purple;">Key Points</mark>

<mark style="color:green;">**Traditional vs. Neural Network-Based Retrieval**</mark><mark style="color:green;">:</mark> Classical retrieval methods rely on term-frequency and are limited by the lexical gap.  Neural network-based methods, such as dense retrievers, learn beyond lexical similarities but require extensive training data.

<mark style="color:green;">**Unsupervised Dense Retrievers**</mark><mark style="color:green;">:</mark> The paper investigates the potential of training dense retrievers without supervision using contrastive learning. This approach aims to match the performance of BM25 in scenarios where training data is scarce or non-existent.

<mark style="color:green;">**Performance on BEIR Benchmark**</mark><mark style="color:green;">:</mark> The unsupervised model trained through contrastive learning outperforms BM25 on a significant portion of the BEIR benchmark, especially in Recall\@100 metric.

<mark style="color:green;">**Few-Shot and Fine-Tuning**</mark><mark style="color:green;">:</mark> When pre-trained via contrastive learning and then fine-tuned with a small number of in-domain examples or on the large MSMARCO dataset, the model shows improvements across the BEIR benchmark.

<mark style="color:green;">**Multilingual Retrieval**</mark><mark style="color:green;">:</mark> The approach leads to robust unsupervised performance in multilingual settings and strong cross-lingual transfer capabilities, even for languages with limited resources or different scripts.

### <mark style="color:purple;">Conclusion</mark>

This work posits contrastive learning as a viable method for training unsupervised dense retrievers, showcasing strong performance across a variety of retrieval tasks and languages.&#x20;

It addresses the limitations of existing neural and term-frequency-based methods, particularly in zero-shot and multilingual retrieval scenarios, marking a significant step towards more adaptable and universally applicable information retrieval systems.

The method described aims to train a dense retriever without supervision using contrastive learning. Here's a breakdown of the approach.
