Skip to content

Commit

Permalink
Add instructions to replicate entire dev set on Compute Canada (casto…
Browse files Browse the repository at this point in the history
  • Loading branch information
qguo96 authored Oct 10, 2020
1 parent 3d4b7c0 commit e815051
Show file tree
Hide file tree
Showing 4 changed files with 171 additions and 3 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,5 @@ The following documents describe how to use Pygaggle on various IR test collecti

+ [Experiments on CovidQA](https://github.com/castorini/pygaggle/blob/master/docs/experiments-CovidQA.md)
+ [Experiments on MS MARCO Document Retrieval](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-document.md)
+ [Experiments on MS MARCO Passage Retrieval](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage.md)
+ [Experiments on MS MARCO Passage Retrieval - Dev Subset](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-subset.md)
+ [Experiments on MS MARCO Passage Retrieval - Entire Dev Set](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-entire.md)
2 changes: 1 addition & 1 deletion docs/experiments-msmarco-document.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# PyGaggle: Baselines on [MS MARCO Document Retrieval](https://github.com/microsoft/TREC-2019-Deep-Learning)

This page contains instructions for running various neural reranking baselines on the MS MARCO *document* ranking task.
Note that there is also a separate [MS MARCO *passage* ranking task](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage.md).
Note that there is also a separate [MS MARCO *passage* ranking task (dev subset)](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-subset.md) and a separate [MS MARCO *passage* ranking task (entrie dev set)](https://github.com/castorini/pygaggle/blob/master/docs/experiments-msmarco-passage-entire.md).

Prior to running this, we suggest looking at our first-stage [BM25 ranking instructions](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md).
We rerank the BM25 run files that contain ~1000 documents per query using monoT5.
Expand Down
167 changes: 167 additions & 0 deletions docs/experiments-msmarco-passage-entire.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) - Entire Dev Set

This page contains instructions for running various neural reranking baselines on the MS MARCO *passage* ranking task. We will run on the entire dev set.
Note that there is also a separate [MS MARCO *document* ranking task](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md) and a separate [MS MARCO *passage* ranking task - Subset](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage-subset.md).

Prior to running this, we suggest looking at our first-stage [BM25 ranking instructions](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md).
We rerank the BM25 run files that contain ~1000 passages per query using both monoBERT and monoT5.
monoBERT and monoT5 are pointwise rerankers. This means that each document is scored independently using either BERT or T5 respectively.

Since it can take days to run these models on all of the 6980 queries from the MS MARCO dev set, we will use Compute Canada to replicate.

## Registration and Virtual Environments

Please follow this [guide](https://github.com/castorini/onboarding/blob/master/docs/cc-guide.md) to create an account on Compute Canada.
After that, please follow that guide to create a virtual environment so that you can easily install Python packages.
Note: Don't forget to update `pip` and `setuptools`.

When you are running experiments for the first time and need to debug here, please submit jobs interactively to ensure your code is bug-free.
After that, write scripts to run the experiment without monitoring it, and you have to do this if you don't want to run experiments for days.

## Installation

After you enter the compute node, let's install Pygaggle under the `~/scratch` directory.

Note 1: Run the following instructions at root of this repo.
Note 2: Make sure that you have access to a GPU.
Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled.
To do this, first clone the repository recursively.

```
git clone --recursive https://github.com/castorini/pygaggle.git
```

Then install PyGaggle using:

```
pip install pygaggle/
```

## Models

+ monoBERT-Large: Passage Re-ranking with BERT [(Nogueira et al., 2019)](https://arxiv.org/pdf/1901.04085.pdf)
+ monoT5-base: Document Ranking with a Pretrained Sequence-to-Sequence Model [(Nogueira et al., 2020)](https://arxiv.org/pdf/2003.06713.pdf)

## Data Prep

We're first going to download the queries, qrels and run files corresponding to the entire MS MARCO dev set considered. The run file is generated by following the BM25 ranking instructions. We'll store all these files in the `data/msmarco_ans_entire` directory.

You can download these three files from this [repository](https://github.com/castorini/duobert).
```
queries.dev.small.tsv: 6,980 queries from the MS MARCO dev set.
qrels.dev.small.tsv: 7,437 pairs of query relevant passage ids from the MS MARCO dev set.
run.bm25.dev.small.tsv: Approximately 6,980,000 pairs of dev set queries and retrieved passages using BM25.
```
Note: Please rename `run.bm25.dev.small.tsv` to `run.dev.small.tsv`.

As a sanity check, we can evaluate the first-stage retrieved documents using the official MS MARCO evaluation script.

```
python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv data/msmarco_ans_entire/run.dev.small.tsv
```

The output should be:

```
#####################
MRR @10: 0.18736452221767383
QueriesRanked: 6980
#####################
```

Let's download and extract the pre-built MS MARCO index into `indexes`:

```
wget https://git.uwaterloo.ca/jimmylin/anserini-indexes/raw/master/index-msmarco-passage-20191117-0ed488.tar.gz -P indexes
tar xvfz indexes/index-msmarco-passage-20191117-0ed488.tar.gz -C indexes
```

Now, we can begin with re-ranking the set.

## Re-Ranking with monoBERT

First, lets evaluate using monoBERT!

```
python -um pygaggle.run.evaluate_passage_ranker --split dev \
--method seq_class_transformer \
--model castorini/monobert-large-msmarco \
--dataset data/msmarco_ans_entire/ \
--index-dir indexes/index-msmarco-passage-20191117-0ed488 \
--task msmarco \
--output-file runs/run.monobert.ans_entire.dev.tsv
```

Upon completion, the following output will be visible:

```
precision@1 0.2533
recall@3 0.45093
recall@50 0.80609
recall@1000 0.86289
mrr 0.38789
mrr@10 0.37922
```

It takes about ~57 hours to re-rank this entire dev set on MS MARCO using a V100.
The type of GPU will directly influence your inference time.
It is possible that the default batch results in a GPU OOM error.
In this case, assigning a batch size (using option `--batch-size`) which is smaller than the default (96) should help!

The re-ranked run file `run.monobert.ans_entire.dev.tsv` will also be available in the `runs` directory upon completion.

We can use the official MS MARCO evaluation script to verify the MRR@10:

```
python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monobert.ans_entire.dev.tsv
```

You should see the same result. Great, let's move on to monoT5!

## Re-Ranking with monoT5

We use the monoT5-base variant as it is the easiest to run without access to larger GPUs/TPUs. Let us now re-rank the set:

```
python -um pygaggle.run.evaluate_passage_ranker --split dev \
--method t5 \
--model castorini/monot5-base-msmarco \
--dataset data/msmarco_ans_entire \
--model-type t5-base \
--task msmarco \
--index-dir indexes/index-msmarco-passage-20191117-0ed488 \
--batch-size 32 \
--output-file runs/run.monot5.ans_entire.dev.tsv
```

The following output will be visible after it has finished:

```
precision@1 0.25129
recall@3 0.45362
recall@50 0.80709
recall@1000 0.86289
mrr 0.38839
mrr@10 0.37986
```

It takes about ~26 hours to re-rank this entire dev set on MS MARCO using a V100.
It is worth noting again that you might need to modify the batch size to best fit the GPU at hand.

Upon completion, the re-ranked run file `run.monot5.ans_entire.dev.tsv` will be available in the `runs` directory.

We can use the official MS MARCO evaluation script to verify the MRR@10:

```
python tools/eval/msmarco_eval.py data/msmarco_ans_entire/qrels.dev.small.tsv runs/run.monot5.ans_entire.dev.tsv
```

You should see the same result.

If you were able to replicate these results, please submit a PR adding to the replication log!
Please mention in your PR if you find any difference!


## Replication Log

+ Results replicated by [@qguo96](https://github.com/qguo96) on 2020-10-08 (commit [`3d4b7c0`](https://github.com/castorini/pygaggle/commit/3d4b7c0a51b5b26e5d39da7c7b9c0cec8e633950)) (Tesla V100 on Compute Canada)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking)
# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking) - Dev Subset

This page contains instructions for running various neural reranking baselines on the MS MARCO *passage* ranking task.
Note that there is also a separate [MS MARCO *document* ranking task](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-doc.md).
Expand Down

0 comments on commit e815051

Please sign in to comment.