Skip to content

Commit

Permalink
Refactor documentation for some dense models (#2003)
Browse files Browse the repository at this point in the history
+ Updated the following models: ance, sbert, bpr, distillbert-tasb, distillbert-kd, dkrr
+ Fixed minor numpy issue
  • Loading branch information
lintool authored Oct 3, 2024
1 parent e9fcff0 commit 33ab20b
Show file tree
Hide file tree
Showing 7 changed files with 166 additions and 104 deletions.
106 changes: 65 additions & 41 deletions docs/experiments-ance.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.ance \
--topics msmarco-passage-dev-subset \
--encoded-queries ance-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.ance.bf.tsv \
--output runs/run.msmarco-passage.ance.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

The option `--encoded-queries` specifies the use of encoded queries (i.e., queries that have already been converted into dense vectors and cached).
Expand All @@ -28,9 +28,13 @@ As an alternative, replace with `--encoder castorini/ance-msmarco-passage` to pe
To evaluate:

```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.tsv
```

Results:

```
#####################
MRR @10: 0.3302
QueriesRanked: 6980
Expand All @@ -41,14 +45,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.ance.bf.tsv \
--output runs/run.msmarco-passage.ance.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.ance.tsv \
--output runs/run.msmarco-passage.ance.trec

python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.trec
```

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.ance.bf.trec
Results:

map all 0.3362
```
map all 0.3363
recall_1000 all 0.9584
```

Expand All @@ -63,7 +71,7 @@ python -m pyserini.search.faiss \
--encoded-queries ance_maxp-msmarco-doc-dev \
--output runs/run.msmarco-doc.passage.ance-maxp.txt \
--output-format msmarco \
--batch-size 36 --threads 12 \
--batch-size 512 --threads 16 \
--hits 1000 --max-passage --max-passage-hits 100
```

Expand All @@ -72,12 +80,16 @@ Same as above, replace `--encoded-queries` with `--encoder castorini/ance-msmarc
To evaluate:

```bash
$ python -m pyserini.eval.msmarco_doc_eval \
--judgments msmarco-doc-dev \
--run runs/run.msmarco-doc.passage.ance-maxp.txt
python -m pyserini.eval.msmarco_doc_eval \
--judgments msmarco-doc-dev \
--run runs/run.msmarco-doc.passage.ance-maxp.txt
```

Results:

```
#####################
MRR @100: 0.3796
MRR @100: 0.3795
QueriesRanked: 5193
#####################
```
Expand All @@ -86,14 +98,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-doc.passage.ance-maxp.txt \
--output runs/run.msmarco-doc.passage.ance-maxp.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-doc.passage.ance-maxp.txt \
--output runs/run.msmarco-doc.passage.ance-maxp.trec

python -m pyserini.eval.trec_eval -c -mrecall.100 -mmap msmarco-doc-dev \
runs/run.msmarco-doc.passage.ance-maxp.trec
```

$ python -m pyserini.eval.trec_eval -c -mrecall.100 -mmap msmarco-doc-dev \
runs/run.msmarco-doc.passage.ance-maxp.trec
Results:

map all 0.3796
```
map all 0.3794
recall_100 all 0.9033
```

Expand All @@ -106,25 +122,29 @@ python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.ance-multi \
--topics dpr-nq-test \
--encoded-queries ance_multi-nq-test \
--output runs/run.ance.nq-test.multi.bf.trec \
--batch-size 36 --threads 12
--output runs/run.ance.nq-test.multi.trec \
--batch-size 512 --threads 16
```

Same as above, replace `--encoded-queries` with `--encoder castorini/ance-dpr-question-multi` for on-the-fly query encoding.

To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-nq-test \
--index wikipedia-dpr-100w \
--input runs/run.ance.nq-test.multi.bf.trec \
--output runs/run.ance.nq-test.multi.bf.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-nq-test \
--index wikipedia-dpr \
--input runs/run.ance.nq-test.multi.trec \
--output runs/run.ance.nq-test.multi.json

python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.nq-test.multi.json \
--topk 20 100
```

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.nq-test.multi.bf.json \
--topk 20 100
Results:

```
Top20 accuracy: 0.8224
Top100 accuracy: 0.8787
```
Expand All @@ -138,25 +158,29 @@ python -m pyserini.search.faiss \
--index wikipedia-dpr-100w.ance-multi \
--topics dpr-trivia-test \
--encoded-queries ance_multi-trivia-test \
--output runs/run.ance.trivia-test.multi.bf.trec \
--batch-size 36 --threads 12
--output runs/run.ance.trivia-test.multi.trec \
--batch-size 512 --threads 16
```

Same as above, replace `--encoded-queries` with `--encoder castorini/ance-dpr-question-multi` for on-the-fly query encoding.

To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-test \
--index wikipedia-dpr-100w \
--input runs/run.ance.trivia-test.multi.bf.trec \
--output runs/run.ance.trivia-test.multi.bf.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.ance.trivia-test.multi.trec \
--output runs/run.ance.trivia-test.multi.json

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.trivia-test.multi.bf.json \
--topk 20 100
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.ance.trivia-test.multi.json \
--topk 20 100
```

Results:

```
Top20 accuracy: 0.8010
Top100 accuracy: 0.8522
```
Expand Down
24 changes: 14 additions & 10 deletions docs/experiments-bpr.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ python -m pyserini.search.faiss \
--topics dpr-nq-test \
--encoded-queries bpr_single_nq-nq-test \
--output runs/run.bpr.rerank.nq-test.nq.hash.trec \
--batch-size 36 --threads 12 \
--batch-size 512 --threads 16 \
--hits 100 --binary-hits 1000 \
--searcher bpr --rerank
```
Expand All @@ -38,18 +38,22 @@ The option `--encoded-queries` specifies the use of encoded queries (i.e., queri
To evaluate, first convert the TREC output format to DPR's `json` format:

```bash
$ python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--index wikipedia-dpr-100w \
--topics dpr-nq-test \
--input runs/run.bpr.rerank.nq-test.nq.hash.trec \
--output runs/run.bpr.rerank.nq-test.nq.hash.json
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--index wikipedia-dpr \
--topics dpr-nq-test \
--input runs/run.bpr.rerank.nq-test.nq.hash.trec \
--output runs/run.bpr.rerank.nq-test.nq.hash.json

python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.bpr.rerank.nq-test.nq.hash.json \
--topk 20 100
```

$ python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.bpr.rerank.nq-test.nq.hash.json \
--topk 20 100
Results:

```
Top20 accuracy: 0.7792
Top100 accuracy: 0.8573
Top100 accuracy: 0.8571
```

## Reproduction Log[*](reproducibility.md)
Expand Down
30 changes: 19 additions & 11 deletions docs/experiments-distilbert_kd.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,25 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.distilbert-dot-margin-mse-t2 \
--topics msmarco-passage-dev-subset \
--encoded-queries distilbert_kd-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Replace `--encoded-queries` with `--encoder sebastian-hofstaetter/distilbert-dot-margin_mse-T2-msmarco` for on-the-fly query encoding.

To evaluate:

```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv
```

Results:

```
#####################
MRR @10: 0.3250
MRR @10: 0.3251
QueriesRanked: 6980
#####################
```
Expand All @@ -38,14 +42,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.tsv \
--output runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.bf.trec
python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-margin_mse-t2.trec
```

Results:

map all 0.3308
```
map all 0.3309
recall_1000 all 0.9553
```

Expand Down
31 changes: 19 additions & 12 deletions docs/experiments-distilbert_tasb.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,25 @@ python -m pyserini.search.faiss \
--index msmarco-v1-passage.distilbert-dot-tas_b-b256 \
--topics msmarco-passage-dev-subset \
--encoded-queries distilbert_tas_b-msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv \
--output-format msmarco \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Replace `--encoded-queries` with `--encoder sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco` for on-the-fly query encoding.

To evaluate:


```bash
$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv
```

Results:

```
#####################
MRR @10: 0.3443
MRR @10: 0.3444
QueriesRanked: 6980
#####################
```
Expand All @@ -39,14 +42,18 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
For that we first need to convert runs and qrels files to the TREC format:

```bash
$ python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.trec
python -m pyserini.eval.convert_msmarco_run_to_trec_run \
--input runs/run.msmarco-passage.distilbert-dot-tas_b-b256.tsv \
--output runs/run.msmarco-passage.distilbert-dot-tas_b-b256.trec

$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.bf.trec
python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset \
runs/run.msmarco-passage.distilbert-dot-tas_b-b256.trec
```

Results:

map all 0.3514
```
map all 0.3515
recall_1000 all 0.9771
```

Expand Down
16 changes: 8 additions & 8 deletions docs/experiments-dkrr.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,20 @@ Running DKRR retrieval on `dpr-nq-dev` and `nq-test` of the Natural Questions da

```bash
python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-nq \
--index wikipedia-dpr-100w.dkrr-nq \
--topics dpr-nq-dev \
--encoded-queries dkrr-dpr-nq-retriever-dpr-nq-dev \
--output runs/run.dpr-dkrr-nq.dev.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16

python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-nq \
--index wikipedia-dpr-100w.dkrr-nq \
--topics nq-test \
--encoded-queries dkrr-dpr-nq-retriever-nq-test \
--output runs/run.dpr-dkrr-nq.test.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```

Alternatively, replace `--encoded-queries ...` with `--encoder castorini/dkrr-dpr-nq-retriever` for on-the-fly query encoding.
Expand Down Expand Up @@ -79,20 +79,20 @@ Running DKRR retrieval on `dpr-trivia-dev` and `dpr-trivia-test` of the TriviaQA

```bash
python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-tqa \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-dev \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-dev \
--output runs/run.dpr-dkrr-trivia.dev.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16

python -m pyserini.search.faiss \
--index wikipedia-dpr-dkrr-tqa \
--index wikipedia-dpr-100w.dkrr-tqa \
--topics dpr-trivia-test \
--encoded-queries dkrr-dpr-tqa-retriever-dpr-tqa-test \
--output runs/run.dpr-dkrr-trivia.test.trec \
--query-prefix question: \
--batch-size 36 --threads 12
--batch-size 512 --threads 16
```
Alternatively, replace `--encoded-queries ...` with `--encoder castorini/dkrr-dpr-tqa-retriever` for on-the-fly query encoding.

Expand Down
Loading

0 comments on commit 33ab20b

Please sign in to comment.