rearrange

castorini · lintool · Jul 16, 2021 · May 26, 2021 · May 27, 2021 · Jun 5, 2021
commit 9f3729f0c3d5bfe2a81de6f1f3c88c908cd643a1
diff --git a/docs/experiments-tct_colbert-v2.md b/docs/experiments-tct_colbert-v2.md
@@ -25,29 +25,27 @@ Summary of results:
 
 Here we notice slight difference between our paper (TF) and reproduction (PT). 
 
-## TCT_ColBERT-V2-HN+ Reproduction
-
+## TCT_ColBERT-V2 Reproduction
 Dense retrieval with TCT-ColBERT, brute-force index:
 
 ```bash
 $ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
-                             --index msmarco-passage-tct_colbert-v2-hnp-bf \
+                             --index msmarco-passage-tct_colbert-v2-bf \
                              --encoded-queries tct_colbert-v2-hnp-msmarco-passage-dev-subset \
                              --batch-size 36 \
                              --threads 12 \
-                             --output runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv \
+                             --output runs/run.msmarco-passage.tct_colbert-v2.bf.tsv \
                              --output-format msmarco
 ```
-
 Note that to ensure maximum reproducibility, by default Pyserini uses pre-computed query representations that are automatically downloaded.
 As an alternative, to perform "on-the-fly" query encoding, see additional instructions below.
 
 To evaluate:
 
 ```bash
-$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv
+$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2.bf.tsv
 #####################
-MRR @10: 0.3584
+MRR @10: 0.3439
 QueriesRanked: 6980
 #####################
 ```
@@ -56,76 +54,79 @@ We can also use the official TREC evaluation tool `trec_eval` to compute other m
 For that we first need to convert runs and qrels files to the TREC format:
 
 ```bash
-$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.trec
-$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.trec
-map                     all     0.3645
-recall_1000             all     0.9695
+$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2.bf.trec
+$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2.bf.trec
+map                     all     0.3509
+recall_1000             all     0.9670
 ```
 
-To perform on-the-fly query encoding with our [pretrained encoder model](https://huggingface.co/castorini/tct_colbert-msmarco/tree/main) use the option `--encoder castorini/tct_colbert-v2-hnp-msmarco`.
-Query encoding will run on the CPU by default.
-To perform query encoding on the GPU, use the option `--device cuda:0`.
-
-
-Follow the same instructions above to perform on-the-fly query encoding.
-The caveat about minor differences in score applies here as well.
-
-## TCT_ColBERT-V2 Reproduction
+## TCT_ColBERT-V2-HN Reproduction
 
 ```bash
 $ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
-                             --index msmarco-passage-tct_colbert-v2-bf \
-                             --encoded-queries tct_colbert-v2-hnp-msmarco-passage-dev-subset \
+                             --index msmarco-passage-tct_colbert-v2-hn-bf \
+                             --encoded-queries tct_colbert-v2-hn-msmarco-passage-dev-subset \
                              --batch-size 36 \
                              --threads 12 \
-                             --output runs/run.msmarco-passage.tct_colbert-v2.bf.tsv \
+                             --output runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv \
                              --output-format msmarco
 ```
+
 To evaluate:
 
 ```bash
-$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2.bf.tsv
+$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv
 #####################
-MRR @10: 0.3439
+MRR @10: 0.3542
 QueriesRanked: 6980
 #####################
 ```
 
 ```bash
-$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2.bf.trec
-$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2.bf.trec
-map                     all     0.3509
-recall_1000             all     0.9670
+$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2-hn.bf.trec
+$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hn.bf.trec
+map                     all     0.3608
+recall_1000             all     0.9708
 ```
 
-## TCT_ColBERT-V2-HN Reproduction
+## TCT_ColBERT-V2-HN+ Reproduction
 
 ```bash
 $ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
-                             --index msmarco-passage-tct_colbert-v2-hn-bf \
-                             --encoded-queries tct_colbert-v2-hn-msmarco-passage-dev-subset \
+                             --index msmarco-passage-tct_colbert-v2-hnp-bf \
+                             --encoded-queries tct_colbert-v2-hnp-msmarco-passage-dev-subset \
                              --batch-size 36 \
                              --threads 12 \
-                             --output runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv \
+                             --output runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv \
                              --output-format msmarco
 ```
+
 To evaluate:
 
 ```bash
-$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv
+$ python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv
 #####################
-MRR @10: 0.3542
+MRR @10: 0.3584
 QueriesRanked: 6980
 #####################
 ```
 
 ```bash
-$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2-hn.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2-hn.bf.trec
-$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hn.bf.trec
-map                     all     0.3608
-recall_1000             all     0.9708
+$ python -m pyserini.eval.convert_msmarco_run_to_trec_run --input runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.tsv --output runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.trec
+$ python -m pyserini.eval.trec_eval -c -mrecall.1000 -mmap msmarco-passage-dev-subset runs/run.msmarco-passage.tct_colbert-v2-hnp.bf.trec
+map                     all     0.3645
+recall_1000             all     0.9695
 ```
 
+To perform on-the-fly query encoding with our [pretrained encoder model](https://huggingface.co/castorini/tct_colbert-msmarco/tree/main) use the option `--encoder castorini/tct_colbert-v2-hnp-msmarco`.
+Query encoding will run on the CPU by default.
+To perform query encoding on the GPU, use the option `--device cuda:0`.
+
+
+Follow the same instructions above to perform on-the-fly query encoding.
+The caveat about minor differences in score applies here as well.
+
+
 
 ### Hybrid Dense-Sparse Retrieval