Fix Document Similarity Check with passed Threshold #6845

Sidchat95 · 2023-06-27T22:09:29Z

Converting the Similarity obtained in the similarity_search_with_score_by_vector method whilst comparing to the passed
threshold. This is because the passed threshold is a number between 0 to 1 and is already in the relevance_score_fn format.
As of now, the function is comparing two different scoring parameters and that wouldn't work.

Dependencies
None

Issue:
Different scores being compared in similarity_search_with_score_by_vector method in FAISS.

Tag maintainer
@hwchase17

… score threshold passed is the relevant score

vercel · 2023-06-27T22:09:32Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)			Jul 13, 2023 4:01am

Sidchat95 · 2023-06-27T22:17:49Z

@pablocastro @farzad528 Kindly request you to please take a look.

hwchase17

self.relevance_score_fn is relevant for _similarity_search_with_relevance_scores

pablocastro · 2023-06-28T00:16:11Z

@Sidchat95 happy to help but you may have the wrong folks, @farzad528 and I work on Azure Cognitive Search, not on FAISS.

Sidchat95 · 2023-06-28T15:45:35Z

self.relevance_score_fn is relevant for _similarity_search_with_relevance_scores

Hi @hwchase17 , as I agree that self.relevance_score_fn is majorly used for _similarity_search_with_relevance_scores function but it is also effecting this stage as well.

Consider the following scenario where we have a vectorstore built out of FAISS vectorstore:

We get the retriever as retriever = vectorstore.as_retriever() which is mostly used for conversational chains.
Given that the search type is set to "similarity_score_threshold", when we call retriever.get_relevant_documents(query) by setting the score threshold to say 0.8, the following workflow is triggered chronologically. (Let us keep in mind that 0.8 threshold set here is the threshold set on the basis of a score of 0 to 1 which ideally is the output of relevance_score_fn.

The steps called are:

 self.vectorstore.similarity_search_with_relevance_scores is triggered with the threshold of 0.8 (Line 410 File langchain.vectorstore.base)

 The next step involves calling self._similarity_search_with_relevance_scores (Line 156 File langchain.vectorstore.base)

 Function _similarity_search_with_relevance_scores is called from file langchain.vectorstore.faiss which in turn calls self.similarity_search_with_score function with the same query and the same threshold of 0.8

Function similarity_search_with_score is called which in turn calls self.similarity_search_with_score_by_vector with the same query and threshold of 0.8

self.similarity_search_with_score_by_vector is called with the threshold of 0.8 and the query. This function first fetches the docs till line 229 of langchain.vectorstore.faiss and then gets the score_threshold set which is still 0.8 in this case. It then matches and filters according to the threshold which is 0.8.

The score returned in the list docs with the similarities is not between 0 and 1 and thus comparing it to 0.8 will essentially not fetch any results. For example a score of 0.312 in the docs list similarity is actually 0.779 by using the function converter 1.0 - 0.312 / math.sqrt(2) which is why we need to first convert the similarity score fetched by docs and then compare it to the incoming threshold.

Sidchat95 · 2023-06-28T15:53:31Z

@Sidchat95 happy to help but you may have the wrong folks, @farzad528 and I work on Azure Cognitive Search, not on FAISS.

Thanks no problem. My bad !

…ision

Sidchat95 · 2023-07-10T17:09:41Z

@hwchase17 Request you to please look into this and approve the workflows

baskaryan · 2023-07-13T00:02:13Z

updated so only affects similarity_search_with_relevance_score

…ision

Sidchat95 · 2023-07-13T04:08:40Z

updated so only affects similarity_search_with_relevance_score

Thanks @baskaryan for the collaboration but changing it in similarity_search_with_relevance_score in FAISS renders Line 227 to 238 in method similarity_search_with_score_by_vector completely useless as this would always pop the threshold from the arguments. Should we remove the redundant code then ?

baskaryan · 2023-07-13T04:53:25Z

updated so only affects similarity_search_with_relevance_score

Thanks @baskaryan for the collaboration but changing it in similarity_search_with_relevance_score in FAISS renders Line 227 to 238 in method similarity_search_with_score_by_vector completely useless as this would always pop the threshold from the arguments. Should we remove the redundant code then ?

it's useless if someone calls similarity_search_with_relevance_score but not useless if someones calls similarity_search_with_score_by_vector directly

Sidchat95 · 2023-07-13T05:05:31Z

updated so only affects similarity_search_with_relevance_score

Thanks @baskaryan for the collaboration but changing it in similarity_search_with_relevance_score in FAISS renders Line 227 to 238 in method similarity_search_with_score_by_vector completely useless as this would always pop the threshold from the arguments. Should we remove the redundant code then ?

it's useless if someone calls similarity_search_with_relevance_score but not useless if someones calls similarity_search_with_score_by_vector directly

Agreed. Thank you @baskaryan . Can we merge this now? I do not have the permissions to merge this actually and seems all checks have passed?

baskaryan · 2023-07-13T05:30:52Z

done, thanks @Sidchat95!

Sidchat95 · 2023-07-17T16:57:59Z

done, thanks @Sidchat95!

Thank you @baskaryan as a part of this PR, can I get the recognition for this twitter account if possible for this? @procoretech

Convert the similarity obtained to relevance scores to compare as the…

b4dcdba

… score threshold passed is the relevant score

hwchase17 reviewed Jun 27, 2023

View reviewed changes

Sidchat95 requested a review from hwchase17 June 29, 2023 18:10

Merge branch 'hwchase17:master' into Fix-FAISS-Score-Threshold-Compar…

2cb889a

…ision

baskaryan added 2 commits July 12, 2023 19:59

cr

7864632

merge

d5f4f92

dev2049 force-pushed the Fix-FAISS-Score-Threshold-Comparision branch from 8c047c8 to d5f4f92 Compare July 13, 2023 00:01

baskaryan added the lgtm label Jul 13, 2023

Merge branch 'hwchase17:master' into Fix-FAISS-Score-Threshold-Compar…

84704db

…ision

baskaryan merged commit c5e50c4 into langchain-ai:master Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Document Similarity Check with passed Threshold #6845

Fix Document Similarity Check with passed Threshold #6845

Sidchat95 commented Jun 27, 2023 •

edited

Loading

vercel bot commented Jun 27, 2023 •

edited

Loading

Sidchat95 commented Jun 27, 2023

hwchase17 left a comment

pablocastro commented Jun 28, 2023

Sidchat95 commented Jun 28, 2023

Sidchat95 commented Jun 28, 2023

Sidchat95 commented Jul 10, 2023

baskaryan commented Jul 13, 2023

Sidchat95 commented Jul 13, 2023

baskaryan commented Jul 13, 2023 •

edited

Loading

Sidchat95 commented Jul 13, 2023

baskaryan commented Jul 13, 2023

Sidchat95 commented Jul 17, 2023

Fix Document Similarity Check with passed Threshold #6845

Fix Document Similarity Check with passed Threshold #6845

Conversation

Sidchat95 commented Jun 27, 2023 • edited Loading

vercel bot commented Jun 27, 2023 • edited Loading

Sidchat95 commented Jun 27, 2023

hwchase17 left a comment

Choose a reason for hiding this comment

pablocastro commented Jun 28, 2023

Sidchat95 commented Jun 28, 2023

Sidchat95 commented Jun 28, 2023

Sidchat95 commented Jul 10, 2023

baskaryan commented Jul 13, 2023

Sidchat95 commented Jul 13, 2023

baskaryan commented Jul 13, 2023 • edited Loading

Sidchat95 commented Jul 13, 2023

baskaryan commented Jul 13, 2023

Sidchat95 commented Jul 17, 2023

Sidchat95 commented Jun 27, 2023 •

edited

Loading

vercel bot commented Jun 27, 2023 •

edited

Loading

baskaryan commented Jul 13, 2023 •

edited

Loading