-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching mono run file, loading the cache file for duo runs #176
Conversation
pygaggle/model/evaluate.py
Outdated
@@ -213,6 +218,8 @@ def evaluate(self, | |||
mono_out = self.mono_reranker.rerank(example.query, example.documents) | |||
mono_texts.append(sorted(enumerate(mono_out), key=lambda x: x[1].score, reverse=True)[:self.mono_hits]) | |||
scores.append(np.array([x.score for x in mono_out])) | |||
if self.mono_cache_writer is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the line below is equivalent and more concise.
if self.mono_cache_writer is not None: | |
if self.mono_cache_writer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The part left WIP is "Loading the 'cache' run file" Although the rankings in the run file are preserved when loaded, their scores are not recorded in the run file. When the metrics are calculated (called "accumulated" in the code), the document rankings are deduced from the sorted order of the scores. As a result, if the number of documents reranked in the duo stage is I wonder if there is an alternate format of a run file that records the score, or if there is an existing method to inject the scores based on their pre-existing ranking in a list of documents. |
wip Fix metric calculation logic when loading mono cache file
Update: When calculating the recall/precision metrics, any document after the top_k'th will be assigned a score of 0 (to indicate "false"), which is the same as the default score when loaded from a run file. Then, only the documents with a non-zero scores will be treated as "true" (i.e. within top_k'th). So I changed the score to indicate "false" to -1. This fixed the issue. |
It was working until I merged the branches recently. I suspect that it was due to changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Merging.
Correctness tested by matching metrics on small dev set.
Mono cache will include the rankings of all texts, so for subsequent duo runs any number of hits would work.
Ran in hydra, Python 3.8.8