Caching mono run file, loading the cache file for duo runs #176

wongalvis14 · 2021-04-10T07:26:01Z

Correctness tested by matching metrics on small dev set.
Mono cache will include the rankings of all texts, so for subsequent duo runs any number of hits would work.

Ran in hydra, Python 3.8.8

rodrigonogueira4 · 2021-04-10T10:20:29Z

pygaggle/model/evaluate.py

@@ -213,6 +218,8 @@ def evaluate(self,
            mono_out = self.mono_reranker.rerank(example.query, example.documents)
            mono_texts.append(sorted(enumerate(mono_out), key=lambda x: x[1].score, reverse=True)[:self.mono_hits])
            scores.append(np.array([x.score for x in mono_out]))
+            if self.mono_cache_writer is not None:


nit: the line below is equivalent and more concise.

Suggested change

if self.mono_cache_writer is not None:

if self.mono_cache_writer:

rodrigonogueira4

LGTM

wongalvis14 · 2021-04-25T15:11:33Z

The part left WIP is "Loading the 'cache' run file"

Although the rankings in the run file are preserved when loaded, their scores are not recorded in the run file. When the metrics are calculated (called "accumulated" in the code), the document rankings are deduced from the sorted order of the scores. As a result, if the number of documents reranked in the duo stage is n, any document beyond the nth will have a score of 0 and the metrics beyond the nth documents will be inaccurate.

I wonder if there is an alternate format of a run file that records the score, or if there is an existing method to inject the scores based on their pre-existing ranking in a list of documents.

wip Fix metric calculation logic when loading mono cache file

wongalvis14 · 2021-05-08T03:52:33Z

Update: When calculating the recall/precision metrics, any document after the top_k'th will be assigned a score of 0 (to indicate "false"), which is the same as the default score when loaded from a run file. Then, only the documents with a non-zero scores will be treated as "true" (i.e. within top_k'th). So I changed the score to indicate "false" to -1. This fixed the issue.

wongalvis14 · 2021-05-26T02:50:49Z

It was working until I merged the branches recently. I suspect that it was due to changing rerank to rescore. Right now, if we change rescore back to rerank in Mono and Duo, it'll work just like before. But if we change them to rerank, even if we sort them, they (a single mono + duo run, and a duo run with cached mono run results) will give different results.

pygaggle/model/evaluate.py

pygaggle/data/msmarco.py

pygaggle/run/evaluate_passage_ranker.py

ronakice

LGTM! Merging.

Write mono run file during duo to specified dir

ef120b0

rodrigonogueira4 reviewed Apr 10, 2021

View reviewed changes

rodrigonogueira4 approved these changes Apr 10, 2021

View reviewed changes

wongalvis14 changed the title ~~Write mono run file during duo to specified dir~~ WIP: Write mono run file during duo to specified dir Apr 16, 2021

wongalvis14 force-pushed the alvis branch from 44f5035 to d9050a8 Compare April 16, 2021 10:57

Load mono cache file

369a5a0

wip Fix metric calculation logic when loading mono cache file

wongalvis14 force-pushed the alvis branch from f2bd30c to 369a5a0 Compare May 8, 2021 03:27

Merge branch 'master' into alvis

bac63c9

ronakice and others added 3 commits May 27, 2021 14:23

fix bug

5008595

small edit

0212348

Merge branch 'master' into alvis

e64220e

wongalvis14 changed the title ~~WIP: Write mono run file during duo to specified dir~~ Write mono run file during duo to specified dir May 28, 2021

wongalvis14 changed the title ~~Write mono run file during duo to specified dir~~ Caching mono run file, loading the cache file for duo runs May 28, 2021

ronakice reviewed May 28, 2021

View reviewed changes

pygaggle/model/evaluate.py Outdated Show resolved Hide resolved

ronakice reviewed May 28, 2021

View reviewed changes

pygaggle/data/msmarco.py Outdated Show resolved Hide resolved

ronakice reviewed May 28, 2021

View reviewed changes

pygaggle/run/evaluate_passage_ranker.py Outdated Show resolved Hide resolved

review

2052d9e

ronakice approved these changes May 28, 2021

View reviewed changes

ronakice merged commit 79fb937 into castorini:master May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching mono run file, loading the cache file for duo runs #176

Caching mono run file, loading the cache file for duo runs #176

wongalvis14 commented Apr 10, 2021 •

edited

Loading

rodrigonogueira4 Apr 10, 2021

rodrigonogueira4 left a comment

wongalvis14 commented Apr 25, 2021

wongalvis14 commented May 8, 2021

wongalvis14 commented May 26, 2021

ronakice left a comment

	if self.mono_cache_writer is not None:
	if self.mono_cache_writer:

Caching mono run file, loading the cache file for duo runs #176

Caching mono run file, loading the cache file for duo runs #176

Conversation

wongalvis14 commented Apr 10, 2021 • edited Loading

rodrigonogueira4 Apr 10, 2021

Choose a reason for hiding this comment

rodrigonogueira4 left a comment

Choose a reason for hiding this comment

wongalvis14 commented Apr 25, 2021

wongalvis14 commented May 8, 2021

wongalvis14 commented May 26, 2021

ronakice left a comment

Choose a reason for hiding this comment

wongalvis14 commented Apr 10, 2021 •

edited

Loading