# Decoding Sentence-BERT

This <mark style="color:blue;">**August 2019**</mark> paper introduces <mark style="color:blue;">**Sentence-BERT (SBERT)**</mark> -a *<mark style="color:yellow;">**modification of the pre-trained BERT network**</mark>* that addresses the inefficiency of BERT for sentence-level tasks.&#x20;

SBERT was developed to quickly <mark style="color:yellow;">derive semantically meaningful sentence embeddings</mark>, which BERT was not initially optimised for.&#x20;

{% embed url="<https://arxiv.org/abs/1908.10084>" %}
[Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
{% endembed %}

### <mark style="color:purple;">The Evolution of Transformers in Natural Language Processing</mark>

Traditionally, the most popular transformers in NLP, such as BERT, RoBERTa, and the original transformer model, have been <mark style="color:yellow;">**focused on computing word-level embeddings**</mark>.  These models excel in tasks like question answering, language modeling, and summarisation.&#x20;

They operate by computing embeddings at the word level and were trained for tasks like masked language modeling.

However, when it comes to a task that requires *<mark style="color:yellow;">**understanding at the sentence level**</mark>*, like semantic search, the computational demands of these word-level transformers become unfeasible.

### <mark style="color:purple;">Semantic Search: The Core of Sentence-Level Understanding</mark>

#### <mark style="color:green;">**Semantic search is all about finding sentences with similar meanings**</mark>

Imagine having to sift through hundreds or thousands of sentences to find those closely related in meaning; using traditional transformers like BERT for this task would be like finding a needle in a haystack, requiring an impractical amount of computational power and time.

The original BERT mode was inefficient because *<mark style="color:yellow;">**BERT requires that both sentences in a pair be processed together**</mark>*, leading to a high computational overhead.&#x20;

For instance, finding the most similar sentence pair in a collection of 10,000 sentences would require approximately 50 million inference computations, taking around 65 hours!

SBERT reduces the time to find the most similar pair from 65 hours to approximately 5 seconds for embedding computation and 0.01 seconds for cosine similarity calculation!!

### <mark style="color:purple;">Introducing Sentence-BERT (SBERT)</mark>

SBERT is a modification of the pre-trained BERT network that uses <mark style="color:blue;">**siamese and triplet network architectures**</mark>.&#x20;

These architectures <mark style="color:yellow;">enable SBERT to create sentence embeddings that can be compared swiftly using cosine similarity</mark>.  The beauty of SBERT lies in its efficiency, making semantic searches across large volumes of sentences quick.

#### <mark style="color:green;">Understanding BERT's Approach</mark>

BERT approaches semantic search in a pairwise fashion, using a <mark style="color:blue;">**cross-encoder**</mark> to calculate similarity scores between two sentences. This method becomes inefficient when dealing with large datasets, as the number of computations grows exponentially.

#### <mark style="color:green;">The Power of Siamese Networks</mark>

Siamese networks *<mark style="color:yellow;">**consist of two or more identical subnetworks with shared parameters**</mark>*. These networks can compute similarity scores and are used extensively in tasks requiring comparison - such as sentences.

#### <mark style="color:green;">**Training Siamese Networks Efficiently**</mark>

Contrary to the brute-force approach of BERT, SBERT's siamese architecture does not require training on every possible pair. Instead, it uses <mark style="color:blue;">**triplet loss**</mark>, where the network learns from an anchor, a positive pair, and a negative pair, thereby reducing computational load significantly.

### <mark style="color:purple;">SBERT Architecture and Functionality</mark>

SBERT modifies the original BERT model by *<mark style="color:yellow;">**removing the final classification head and incorporating the siamese architecture.**</mark>* &#x20;

During training, SBERT processes pairs of sentences, with each BERT subnetwork producing pooled sentence embeddings. These embeddings are then compared using cosine similarity to produce a similarity score.

### <mark style="color:purple;">The SentenceTransformers Library</mark>

Using SBERT is straightforward, thanks to its Python library.&#x20;

{% embed url="<https://sbert.net/>" %}
SentenceTransformers Documentation
{% endembed %}

Whether you're computing sentence embeddings or conducting semantic similarity searches, SBERT offers a simple and efficient solution.&#x20;

Here's a glimpse of how you can use SBERT:

```python
from sentence_transformers import SentenceTransformer, util

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample sentences
sentences = ['Your first sentence here', 'Your second sentence here']

# Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Compute similarity scores
cosine_scores = util.cos_sim(embeddings, embeddings)

# Output the results
for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        print(f"Sentence 1: {sentences[i]}")
        print(f"Sentence 2: {sentences[j]}")
        print(f"Similarity Score: {cosine_scores[i][j]}")
```

The code snippet is a practical example of using the SentenceTransformer library in Python for <mark style="color:yellow;">semantic similarity analysis between sentences.</mark>&#x20;

Here's a breakdown of what each part of the code does and how it would contribute to your blog:

<mark style="color:green;">**Importing the SentenceTransformer Library**</mark>

```python
from sentence_transformers import SentenceTransformer, util
```

This line imports the necessary components from the <mark style="color:yellow;">**`sentence_transformers`**</mark> library. <mark style="color:yellow;">**`SentenceTransformer`**</mark> is used to *<mark style="color:yellow;">**load the pre-trained model**</mark>*, and <mark style="color:yellow;">**`util`**</mark> provides utility functions like computing cosine similarity.

#### <mark style="color:green;">**Initialising the SentenceTransformer Model**</mark>

```python
model = SentenceTransformer('all-MiniLM-L6-v2')
```

This initialises the SentenceTransformer model with the <mark style="color:yellow;">`'all-MiniLM-L6-v2'`</mark> pre-trained model.&#x20;

This particular model is known for its efficiency and effectiveness in generating sentence embeddings. It's a compact model but still delivers high-quality results, making it suitable for various NLP tasks, including semantic similarity.

#### <mark style="color:green;">**Defining Sample Sentences**</mark>

```python
sentences = ['Your first sentence here', 'Your second sentence here']
```

Here, you define a list of sentences for which you want to compute the semantic similarity. In a blog context, these could be sentences you want to compare for thematic similarity, style, or content.

#### <mark style="color:green;">**Computing Sentence Embeddings**</mark>

```python
embeddings = model.encode(sentences, convert_to_tensor=True)
```

This line *<mark style="color:yellow;">**converts your sentences into embeddings**</mark>*, which are high-dimensional vectors representing the semantic information of each sentence. By setting <mark style="color:yellow;">**`convert_to_tensor=True`**</mark>, these embeddings are converted to PyTorch tensors, which are suitable for further computations like similarity scoring.

<mark style="color:green;">**Computing Cosine Similarity Scores**</mark>

```python
cosine_scores = util.cos_sim(embeddings, embeddings)
```

The <mark style="color:yellow;">**`util.cos_sim`**</mark> function computes the cosine similarity between all pairs of sentence embeddings.&#x20;

Cosine similarity measures how similar two sentences are in terms of their content and meaning, with a score ranging from -1 (completely dissimilar) to 1 (exactly similar).

#### <mark style="color:green;">**Outputting the Results**</mark>

```python
for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        print(f"Sentence 1: {sentences[i]}")
        print(f"Sentence 2: {sentences[j]}")
        print(f"Similarity Score: {cosine_scores[i][j]}")
```

This loop iterates over the pairs of sentences and <mark style="color:yellow;">prints out each pair with their corresponding similarity score</mark>. It effectively demonstrates which sentences in your list are more semantically similar to each other.

### <mark style="color:purple;">Conclusion</mark>

In conclusion, the power of the SentenceTransformer library in Python lies in its ability to transform the way we understand and analyse text.

By leveraging this tool for semantic similarity analysis, you can unlock new depths in content analysis, refine SEO strategies, enhance reader engagement, and even improve content recommendation systems.
