Add instructions to replicate entire dev set on Compute Canada (casto…

…rini#99)
leungjch · Oct 10, 2020 · e815051 · e815051
1 parent 3d4b7c0
commit e815051
Show file tree

Hide file tree

Showing 4 changed files with 171 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -92,4 +92,5 @@ The following documents describe how to use Pygaggle on various IR test collecti
 
 + [Experiments on CovidQA](https://github.com/castorini/pygaggle/blob/master/docs/experiments-CovidQA.md)
 + [Experiments on MS MARCO Document Retrieval](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-document.md)
-+ [Experiments on MS MARCO Passage Retrieval](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage.md)
++ [Experiments on MS MARCO Passage Retrieval - Dev Subset](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-subset.md)
++ [Experiments on MS MARCO Passage Retrieval - Entire Dev Set](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-entire.md)
diff --git a/docs/experiments-msmarco-document.md b/docs/experiments-msmarco-document.md
@@ -1,7 +1,7 @@
 # PyGaggle: Baselines on [MS MARCO Document Retrieval](https://github.com/microsoft/TREC-2019-Deep-Learning)
 
 This page contains instructions for running various neural reranking baselines on the MS MARCO *document* ranking task. 
-Note that there is also a separate [MS MARCO *passage* ranking task](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage.md).
+Note that there is also a separate [MS MARCO *passage* ranking task (dev subset)](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-subset.md) and a separate [MS MARCO *passage* ranking task (entrie dev set)](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-entire.md).
 
 Prior to running this, we suggest looking at our first-stage [BM25 ranking instructions](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md).
 We rerank the BM25 run files that contain ~1000 documents per query using monoT5.

diff --git a/docs/experiments-msmarco-passage-entire.md b/docs/experiments-msmarco-passage-entire.md
@@ -0,0 +1,167 @@
+# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) - Entire Dev Set
+
+This page contains instructions for running various neural reranking baselines on the MS MARCO *passage* ranking task. We will run on the entire dev set. 
+Note that there is also a separate [MS MARCO *document* ranking task](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md) and a separate [MS MARCO *passage* ranking task - Subset](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage-subset.md).
+
+Prior to running this, we suggest looking at our first-stage [BM25 ranking instructions](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md).
+We rerank the BM25 run files that contain ~1000 passages per query using both monoBERT and monoT5.
+monoBERT and monoT5 are pointwise rerankers. This means that each document is scored independently using either BERT or T5 respectively.
+
+Since it can take days to run these models on all of the 6980 queries from the MS MARCO dev set, we will use Compute Canada to replicate.
+
+## Registration and Virtual Environments
+
+Please follow this [guide](https://github.com/castorini/onboarding/blob/master/docs/cc-guide.md) to create an account on Compute Canada.
+After that, please follow that guide to create a virtual environment so that you can easily install Python packages.
+Note: Don't forget to update `pip` and `setuptools`.
+
+When you are running experiments for the first time and need to debug here, please submit jobs interactively to ensure your code is bug-free.
+After that, write scripts to run the experiment without monitoring it, and you have to do this if you don't want to run experiments for days. 
+
+## Installation
+
+After you enter the compute node, let's install Pygaggle under the `~/scratch` directory.
+
+Note 1: Run the following instructions at root of this repo.
+Note 2: Make sure that you have access to a GPU.
+Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled. 
+To do this, first clone the repository recursively.
+
+```
+git clone --recursive https://github.com/castorini/pygaggle.git
+```
+
+Then install PyGaggle using:
+
+```
+pip install pygaggle/
+```
+
+## Models
+
++ monoBERT-Large: Passage Re-ranking with BERT [(Nogueira et al., 2019)](https://arxiv.org/pdf/1901.04085.pdf)
++ monoT5-base: Document Ranking with a Pretrained Sequence-to-Sequence Model [(Nogueira et al., 2020)](https://arxiv.org/pdf/2003.06713.pdf)
+
+## Data Prep
+
+We're first going to download the queries, qrels and run files corresponding to the entire MS MARCO dev set considered. The run file is generated by following the BM25 ranking instructions. We'll store all these files in the `data/msmarco_ans_entire` directory.
+
+You can download these three files from this [repository](https://github.com/castorini/duobert).
+```
+queries.dev.small.tsv: 6,980 queries from the MS MARCO dev set.
+qrels.dev.small.tsv: 7,437 pairs of query relevant passage ids from the MS MARCO dev set.
+run.bm25.dev.small.tsv: Approximately 6,980,000 pairs of dev set queries and retrieved passages using BM25.
+```
+Note: Please rename `run.bm25.dev.small.tsv` to `run.dev.small.tsv`.
+
+As a sanity check, we can evaluate the first-stage retrieved documents using the official MS MARCO evaluation script.
+
+```
+python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv data/msmarco_ans_entire/run.dev.small.tsv
+```
+
+The output should be:
+
+```
+#####################
+MRR @10: 0.18736452221767383
+QueriesRanked: 6980
+#####################
+```
+
+Let's download and extract the pre-built MS MARCO index into `indexes`:
+
+```
+wget https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-msmarco-passage-20191117-0ed488.tar.gz -P indexes
+tar xvfz indexes/index-msmarco-passage-20191117-0ed488.tar.gz -C indexes
+```
+
+Now, we can begin with re-ranking the set.
+
+## Re-Ranking with monoBERT
+
+First, lets evaluate using monoBERT!
+
+```
+python -um pygaggle.run.evaluate_passage_ranker --split dev \
+                                                --method seq_class_transformer \
+                                                --model castorini/monobert-large-msmarco \
+                                                --dataset data/msmarco_ans_entire/ \
+                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
+                                                --task msmarco \
+                                                --output-file runs/run.monobert.ans_entire.dev.tsv
+```
+
+Upon completion, the following output will be visible:
+
+```
+precision@1     0.2533
+recall@3        0.45093
+recall@50       0.80609
+recall@1000     0.86289
+mrr             0.38789
+mrr@10          0.37922
+```
+
+It takes about ~57 hours to re-rank this entire dev set on MS MARCO using a V100. 
+The type of GPU will directly influence your inference time. 
+It is possible that the default batch results in a GPU OOM error.
+In this case, assigning a batch size (using option `--batch-size`) which is smaller than the default (96) should help!
+
+The re-ranked run file `run.monobert.ans_entire.dev.tsv` will also be available in the `runs` directory upon completion.
+
+We can use the official MS MARCO evaluation script to verify the MRR@10:
+
+```
+python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monobert.ans_entire.dev.tsv
+```
+
+You should see the same result. Great, let's move on to monoT5!
+
+## Re-Ranking with monoT5
+
+We use the monoT5-base variant as it is the easiest to run without access to larger GPUs/TPUs. Let us now re-rank the set:
+
+```
+python -um pygaggle.run.evaluate_passage_ranker --split dev \
+                                                --method t5 \
+                                                --model castorini/monot5-base-msmarco \
+                                                --dataset data/msmarco_ans_entire \
+                                                --model-type t5-base \
+                                                --task msmarco \
+                                                --index-dir indexes/index-msmarco-passage-20191117-0ed488 \
+                                                --batch-size 32 \
+                                                --output-file runs/run.monot5.ans_entire.dev.tsv
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.25129
+recall@3        0.45362
+recall@50       0.80709
+recall@1000     0.86289
+mrr             0.38839
+mrr@10          0.37986
+```
+
+It takes about ~26 hours to re-rank this entire dev set on MS MARCO using a V100. 
+It is worth noting again that you might need to modify the batch size to best fit the GPU at hand.
+
+Upon completion, the re-ranked run file `run.monot5.ans_entire.dev.tsv` will be available in the `runs` directory.
+
+We can use the official MS MARCO evaluation script to verify the MRR@10:
+
+```
+python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monot5.ans_entire.dev.tsv
+```
+
+You should see the same result.
+
+If you were able to replicate these results, please submit a PR adding to the replication log!
+Please mention in your PR if you find any difference!
+
+
+## Replication Log
+
++ Results replicated by [@qguo96](https://github.com/qguo96) on 2020-10-08 (commit [`3d4b7c0`](https://github.com/castorini/pygaggle/commit/3d4b7c0a51b5b26e5d39da7c7b9c0cec8e633950)) (Tesla V100 on Compute Canada)
diff --git a/docs/experiments-msmarco-passage.md → docs/experiments-msmarco-passage-subset.md b/docs/experiments-msmarco-passage.md → docs/experiments-msmarco-passage-subset.md
@@ -1,4 +1,4 @@
-# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) - Dev Subset 
 
 This page contains instructions for running various neural reranking baselines on the MS MARCO *passage* ranking task. 
 Note that there is also a separate [MS MARCO *document* ranking task](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md).