Many natural language processing (NLP) applications play an important role in phrase.
Whether you make chat boats, recommendation systems, or search engines, understanding how close the two sentences means can improve the user’s experience – and that is what allows you to do all that match.
Transformer of the sentence Make this process easy and efficient. In this guide, you will learn what the phrase is similar, how the transformers of the phrase work, and how to write the code to measure the similarities between the two sets of the sentence.
The table of content
What is the similarities of the phrase?
The similarities of the phrase are the process of comparing two sentences so that to see how close they are in sense. It does not see the same words but focuses on the meaning of them.
For example:
Both sentences talk about animals outside, so they share some similarities even though they use different words.
Such understanding is essential for tasks such as document clustering, duplicate detection, or spiritual search.
Why use sentence transformers
Traditional methods such as bag bags rely on simple words matching or frequency count. But they fail when the words are different but the meaning remains the same.
The phrase transformer transformer transformer -based language models use it Bert Or to make Roberta embedding.
There is a list of an embedid numbers that represents the meaning of an sentence. When two embellishments are close to each other in this high -dimensional place, their phrases are the same.
The transformer library in the sentence makes it easier by providing a pre -trained model that can make embedded for sentences.
Installing the desired libraries
Before starting coding, make sure you install the desired packages. Run this command to do this:
pip install -U sentence-transformers
It will also install the transformer library of the sentence with its dependence.
Loading a pre -trained model
The phrase transformers offer several pre -trained models. For this example, you will use All Manielm-LIL 6-V 2 Model. It works well for lightweight, fast, and mostly applications.
This is how to load it in azagar:
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("all-MiniLM-L6-v2")
Once a loaded, this model can transform any phrase into a related embedded.
To explain sentences to compare
Compare you you need two sentences. Here is an example:
sentences1 = (
'The cat sits outside',
'A man is playing guitar',
'The movies are awesome'
)
sentences2 = (
'The dog plays in the garden',
'A woman watches TV',
'The new movie is so great'
)
In each sentence sentences1 To be compared with phrases to the same position sentences2.
Turning phrases into embedded
Now that you have sentences, you have to turn them into embedded using the model.
Add this code:
# Convert sentences to embeddings
embeddings1 = model.encode(sentences1, convert_to_tensor=True)
embeddings2 = model.encode(sentences2, convert_to_tensor=True)
convert_to_tensor=True The argument tells the model to come back Piturich tensorsWho work well with the matching calculation.
Calcing the Kosin matching
Once you are embedded, you need a way to measure the match. Cosin matching Matriculation is commonly used for it.
The Kosni match looks at the angle between two vectors in a high -dimensional place. If the angle is small, the vector is the same.
Add this code to the matching calculation:
from sentence_transformers import util
# Compute cosine similarity
cosine_scores = util.cos_sim(embeddings1, embeddings2)
Now cosine_scores There is a matching score for each phrase pair.
Results printing
To view the results clearly, format them like this:
# Print formatted results
for i in range(len(sentences1)):
print(f"{sentences1(i)} \t\t {sentences2(i)} \t\t Score: {cosine_scores(i)(i):.4f}")
It will raid each sentence with a matching score of each sentence.
Sample output
If you run this code, you will see the result as below.

The third duo has the highest score as both phrases talk about the films positively.
How to translate the score
Kosin is between the matching scores -1 And 1.
Close to a score 1 This means that the sentences are very similar.
A score closer 0 This means they are irrelevant.
Negative values ​​mean that the phrase is not concerned or even contradicts it.
In most real -world affairs, you focus on values ​​between 0 and 1. The higher the value, the closer it is.
Real -world requests for phrase matching
The similarities of the phrase have become the main part of many modern applications as it helps to understand the meaning of the system rather than relying on the exact words. This change makes search, analysis and recommendations much more accurate and useful.
The meaningful search
Traditional search engine depends on keyword matches. If the exact words are missing, the results often become irrelevant. The meaningful search Seeing the meaning behind a question solves this problem.
For example, if someone finds the “best way to learn guitar”, the system can return the results of “guitar -playing top to -to -tops”, though key words are different. This makes the search experiences smooth and more intelligent.
Detect duplicate
Large datases are often repeated or near duplicate content. Manual checking is impossible when dealing with millions of records.
All similarities automatically make it by detecting texts that have the same meaning, even if words change slightly. It is especially useful in handling data cleaning, web scraping pipelines, or the content manufactured by the user.
Recommendation system
The recommendation engines do the best when they understand the context. For example, if a user likes articles about “healthy cooking”, the system can recommend the material on “nutritional recipes” or “quick healthy meals” using matching scores. This approach is beyond the surface level keyword and finds deep contact in the text.
Chat Bots and Virtual Assistant
Chat boats store a huge set of user’s potential questions and answers. When someone types a new question, the system has to find the most relevant answer. Using the similarities of the phrase, chat boats meet the user’s input with the nearest existing inquiry, not just words, which are more accurate and natural conversation.
Improve performance with large models
All Manielm-LIL 6-V 2 The model for small medium tasks is fast and accurate.
More accuracy L you, you can try like big models All MPNET-BASE-V2Although they may need more memory and time to run.
Change the model in your code to use a pre -trained model:
model = SentenceTransformer("all-mpnet-base-v2")
Conclusion
The transformers of the phrase make it easier to measure the similarities of the sentence using pre -trained models. By converting sentences into embellishment and comparing them with a similar similarity, you can create a system that understands the meaning rather than relying on simple words.
With just a few lines of the code, you can integrate it into chat boats, search engines, or recommendations systems and create more intelligent applications.
Hope you enjoy this article. Sign up for my free newsletter turningtalks.ai For more lessons on AI. You can also Visit my website.