Decoding Sentence-BERT

This August 2019 paper introduces Sentence-BERT (SBERT) -a modification of the pre-trained BERT network that addresses the inefficiency of BERT for sentence-level tasks.

SBERT was developed to quickly derive semantically meaningful sentence embeddings, which BERT was not initially optimised for.

The Evolution of Transformers in Natural Language Processing

Traditionally, the most popular transformers in NLP, such as BERT, RoBERTa, and the original transformer model, have been focused on computing word-level embeddings. These models excel in tasks like question answering, language modeling, and summarisation.

They operate by computing embeddings at the word level and were trained for tasks like masked language modeling.

However, when it comes to a task that requires understanding at the sentence level, like semantic search, the computational demands of these word-level transformers become unfeasible.

Semantic Search: The Core of Sentence-Level Understanding

Semantic search is all about finding sentences with similar meanings

Imagine having to sift through hundreds or thousands of sentences to find those closely related in meaning; using traditional transformers like BERT for this task would be like finding a needle in a haystack, requiring an impractical amount of computational power and time.

The original BERT mode was inefficient because BERT requires that both sentences in a pair be processed together, leading to a high computational overhead.

For instance, finding the most similar sentence pair in a collection of 10,000 sentences would require approximately 50 million inference computations, taking around 65 hours!

SBERT reduces the time to find the most similar pair from 65 hours to approximately 5 seconds for embedding computation and 0.01 seconds for cosine similarity calculation!!

Introducing Sentence-BERT (SBERT)

SBERT is a modification of the pre-trained BERT network that uses siamese and triplet network architectures.

These architectures enable SBERT to create sentence embeddings that can be compared swiftly using cosine similarity. The beauty of SBERT lies in its efficiency, making semantic searches across large volumes of sentences quick.

Understanding BERT's Approach

BERT approaches semantic search in a pairwise fashion, using a cross-encoder to calculate similarity scores between two sentences. This method becomes inefficient when dealing with large datasets, as the number of computations grows exponentially.

The Power of Siamese Networks

Siamese networks consist of two or more identical subnetworks with shared parameters. These networks can compute similarity scores and are used extensively in tasks requiring comparison - such as sentences.

Training Siamese Networks Efficiently

Contrary to the brute-force approach of BERT, SBERT's siamese architecture does not require training on every possible pair. Instead, it uses triplet loss, where the network learns from an anchor, a positive pair, and a negative pair, thereby reducing computational load significantly.

SBERT Architecture and Functionality

SBERT modifies the original BERT model by removing the final classification head and incorporating the siamese architecture.

During training, SBERT processes pairs of sentences, with each BERT subnetwork producing pooled sentence embeddings. These embeddings are then compared using cosine similarity to produce a similarity score.

The SentenceTransformers Library

Using SBERT is straightforward, thanks to its Python library.

Whether you're computing sentence embeddings or conducting semantic similarity searches, SBERT offers a simple and efficient solution.

Here's a glimpse of how you can use SBERT:

from sentence_transformers import SentenceTransformer, util

# Initialize the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample sentences
sentences = ['Your first sentence here', 'Your second sentence here']

# Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Compute similarity scores
cosine_scores = util.cos_sim(embeddings, embeddings)

# Output the results
for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        print(f"Sentence 1: {sentences[i]}")
        print(f"Sentence 2: {sentences[j]}")
        print(f"Similarity Score: {cosine_scores[i][j]}")

The code snippet is a practical example of using the SentenceTransformer library in Python for semantic similarity analysis between sentences.

Here's a breakdown of what each part of the code does and how it would contribute to your blog:

Importing the SentenceTransformer Library

from sentence_transformers import SentenceTransformer, util

This line imports the necessary components from the sentence_transformers library. SentenceTransformer is used to load the pre-trained model, and util provides utility functions like computing cosine similarity.

Initialising the SentenceTransformer Model

model = SentenceTransformer('all-MiniLM-L6-v2')

This initialises the SentenceTransformer model with the 'all-MiniLM-L6-v2' pre-trained model.

This particular model is known for its efficiency and effectiveness in generating sentence embeddings. It's a compact model but still delivers high-quality results, making it suitable for various NLP tasks, including semantic similarity.

Defining Sample Sentences

sentences = ['Your first sentence here', 'Your second sentence here']

Here, you define a list of sentences for which you want to compute the semantic similarity. In a blog context, these could be sentences you want to compare for thematic similarity, style, or content.

Computing Sentence Embeddings

embeddings = model.encode(sentences, convert_to_tensor=True)

This line converts your sentences into embeddings, which are high-dimensional vectors representing the semantic information of each sentence. By setting convert_to_tensor=True, these embeddings are converted to PyTorch tensors, which are suitable for further computations like similarity scoring.

Computing Cosine Similarity Scores

cosine_scores = util.cos_sim(embeddings, embeddings)

The util.cos_sim function computes the cosine similarity between all pairs of sentence embeddings.

Cosine similarity measures how similar two sentences are in terms of their content and meaning, with a score ranging from -1 (completely dissimilar) to 1 (exactly similar).

Outputting the Results

for i in range(len(cosine_scores)-1):
    for j in range(i+1, len(cosine_scores)):
        print(f"Sentence 1: {sentences[i]}")
        print(f"Sentence 2: {sentences[j]}")
        print(f"Similarity Score: {cosine_scores[i][j]}")

This loop iterates over the pairs of sentences and prints out each pair with their corresponding similarity score. It effectively demonstrates which sentences in your list are more semantically similar to each other.

Conclusion

In conclusion, the power of the SentenceTransformer library in Python lies in its ability to transform the way we understand and analyse text.

By leveraging this tool for semantic similarity analysis, you can unlock new depths in content analysis, refine SEO strategies, enhance reader engagement, and even improve content recommendation systems.

PreviousUsing the Output Embedding to Improve Language Models NextColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Last updated 1 year ago

Was this helpful?