Decoding Sentence-BERT
Last updated
Copyright Continuum Labs - 2023
This August 2019 paper introduces Sentence-BERT (SBERT) -a modification of the pre-trained BERT network that addresses the inefficiency of BERT for sentence-level tasks.
SBERT was developed to quickly derive semantically meaningful sentence embeddings, which BERT was not initially optimised for.
Traditionally, the most popular transformers in NLP, such as BERT, RoBERTa, and the original transformer model, have been focused on computing word-level embeddings. These models excel in tasks like question answering, language modeling, and summarisation.
They operate by computing embeddings at the word level and were trained for tasks like masked language modeling.
However, when it comes to a task that requires understanding at the sentence level, like semantic search, the computational demands of these word-level transformers become unfeasible.
Imagine having to sift through hundreds or thousands of sentences to find those closely related in meaning; using traditional transformers like BERT for this task would be like finding a needle in a haystack, requiring an impractical amount of computational power and time.
The original BERT mode was inefficient because BERT requires that both sentences in a pair be processed together, leading to a high computational overhead.
For instance, finding the most similar sentence pair in a collection of 10,000 sentences would require approximately 50 million inference computations, taking around 65 hours!
SBERT reduces the time to find the most similar pair from 65 hours to approximately 5 seconds for embedding computation and 0.01 seconds for cosine similarity calculation!!
SBERT is a modification of the pre-trained BERT network that uses siamese and triplet network architectures.
These architectures enable SBERT to create sentence embeddings that can be compared swiftly using cosine similarity. The beauty of SBERT lies in its efficiency, making semantic searches across large volumes of sentences quick.
BERT approaches semantic search in a pairwise fashion, using a cross-encoder to calculate similarity scores between two sentences. This method becomes inefficient when dealing with large datasets, as the number of computations grows exponentially.
Siamese networks consist of two or more identical subnetworks with shared parameters. These networks can compute similarity scores and are used extensively in tasks requiring comparison - such as sentences.
Contrary to the brute-force approach of BERT, SBERT's siamese architecture does not require training on every possible pair. Instead, it uses triplet loss, where the network learns from an anchor, a positive pair, and a negative pair, thereby reducing computational load significantly.
SBERT modifies the original BERT model by removing the final classification head and incorporating the siamese architecture.
During training, SBERT processes pairs of sentences, with each BERT subnetwork producing pooled sentence embeddings. These embeddings are then compared using cosine similarity to produce a similarity score.
Using SBERT is straightforward, thanks to its Python library.
Whether you're computing sentence embeddings or conducting semantic similarity searches, SBERT offers a simple and efficient solution.
Here's a glimpse of how you can use SBERT:
The code snippet is a practical example of using the SentenceTransformer library in Python for semantic similarity analysis between sentences.
Here's a breakdown of what each part of the code does and how it would contribute to your blog:
Importing the SentenceTransformer Library
This line imports the necessary components from the sentence_transformers
library. SentenceTransformer
is used to load the pre-trained model, and util
provides utility functions like computing cosine similarity.
This initialises the SentenceTransformer model with the 'all-MiniLM-L6-v2'
pre-trained model.
This particular model is known for its efficiency and effectiveness in generating sentence embeddings. It's a compact model but still delivers high-quality results, making it suitable for various NLP tasks, including semantic similarity.
Here, you define a list of sentences for which you want to compute the semantic similarity. In a blog context, these could be sentences you want to compare for thematic similarity, style, or content.
This line converts your sentences into embeddings, which are high-dimensional vectors representing the semantic information of each sentence. By setting convert_to_tensor=True
, these embeddings are converted to PyTorch tensors, which are suitable for further computations like similarity scoring.
Computing Cosine Similarity Scores
The util.cos_sim
function computes the cosine similarity between all pairs of sentence embeddings.
Cosine similarity measures how similar two sentences are in terms of their content and meaning, with a score ranging from -1 (completely dissimilar) to 1 (exactly similar).
This loop iterates over the pairs of sentences and prints out each pair with their corresponding similarity score. It effectively demonstrates which sentences in your list are more semantically similar to each other.
In conclusion, the power of the SentenceTransformer library in Python lies in its ability to transform the way we understand and analyse text.
By leveraging this tool for semantic similarity analysis, you can unlock new depths in content analysis, refine SEO strategies, enhance reader engagement, and even improve content recommendation systems.