Skip to content

Commit

Permalink
Add to onboarding reproduction logs (castorini#2273)
Browse files Browse the repository at this point in the history
  • Loading branch information
alimt1992 authored Nov 25, 2023
1 parent a66f86f commit 227d93a
Show file tree
Hide file tree
Showing 21 changed files with 106 additions and 104 deletions.
8 changes: 4 additions & 4 deletions docs/elastirini.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Run the following command to reproduce Anserini BM25 retrieval:
```bash
sh target/appassembler/bin/SearchElastic \
-topics tools/topics-and-qrels/topics.robust04.txt \
-topicreader Trec -es.index robust04 \
-topicReader Trec -es.index robust04 \
-output runs/run.es.robust04.bm25.topics.robust04.txt
```

Expand Down Expand Up @@ -115,7 +115,7 @@ Retrieval:
```bash
sh target/appassembler/bin/SearchElastic \
-topics tools/topics-and-qrels/topics.core18.txt \
-topicreader Trec \
-topicReader Trec \
-es.index core18 \
-output runs/run.es.core18.bm25.topics.core18.txt
```
Expand Down Expand Up @@ -161,7 +161,7 @@ Retrieval:
```bash
sh target/appassembler/bin/SearchElastic \
-topics tools/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \
-topicreader TsvString \
-topicReader TsvString \
-es.index msmarco-passage \
-output runs/run.es.msmacro-passage.txt
```
Expand Down Expand Up @@ -207,7 +207,7 @@ Retrieval:
```bash
sh target/appassembler/bin/SearchElastic \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-es.index msmarco-doc \
-output runs/run.es.msmarco-doc.txt
```
Expand Down
48 changes: 24 additions & 24 deletions docs/experiments-covid.md
Original file line number Diff line number Diff line change
Expand Up @@ -442,12 +442,12 @@ Abstract runs:

```
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-abstract-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-output runs/anserini.covid-r2.abstract.qq.bm25.txt -runtag anserini.covid-r2.abstract.qq.bm25.txt \
-removedups -bm25 -hits 10000
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-abstract-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-output runs/anserini.covid-r2.abstract.qdel.bm25.txt -runtag anserini.covid-r2.abstract.qdel.bm25.txt \
-removedups -bm25 -hits 10000
Expand All @@ -462,12 +462,12 @@ Full-text runs:

```
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-full-text-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-output runs/anserini.covid-r2.full-text.qq.bm25.txt -runtag anserini.covid-r2.full-text.qq.bm25.txt \
-removedups -bm25 -hits 10000
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-full-text-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-output runs/anserini.covid-r2.full-text.qdel.bm25.txt -runtag anserini.covid-r2.full-text.qdel.bm25.txt \
-removedups -bm25 -hits 10000
Expand All @@ -482,12 +482,12 @@ Paragraph runs:

```
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-paragraph-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2.xml -topicfield query+question \
-output runs/anserini.covid-r2.paragraph.qq.bm25.txt -runtag anserini.covid-r2.paragraph.qq.bm25.txt \
-selectMaxPassage -bm25 -hits 10000
target/appassembler/bin/SearchCollection -index indexes/lucene-index-cord19-paragraph-2020-05-01 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round2-udel.xml -topicfield query \
-output runs/anserini.covid-r2.paragraph.qdel.bm25.txt -runtag anserini.covid-r2.paragraph.qdel.bm25.txt \
-selectMaxPassage -bm25 -hits 10000
Expand Down Expand Up @@ -550,27 +550,27 @@ Here are the commands to generate the runs on the abstract index:

```bash
target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -removedups \
-bm25 -output runs/run.covid-r1.abstract.query.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question -removedups \
-bm25 -output runs/run.covid-r1.abstract.question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question -removedups \
-bm25 -output runs/run.covid-r1.abstract.query+question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative -removedups \
-bm25 -output runs/run.covid-r1.abstract.query+question+narrative.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query -removedups \
-bm25 -output runs/run.covid-r1.abstract.query-udel.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator -removedups \
-bm25 -output runs/run.covid-r1.abstract.query-covid19.bm25.txt
```

Expand All @@ -596,27 +596,27 @@ Here are the commands to generate the runs on the full-text index:

```bash
target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -removedups \
-bm25 -output runs/run.covid-r1.full-text.query.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question -removedups \
-bm25 -output runs/run.covid-r1.full-text.question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question -removedups \
-bm25 -output runs/run.covid-r1.full-text.query+question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative -removedups \
-bm25 -output runs/run.covid-r1.full-text.query+question+narrative.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query -removedups \
-bm25 -output runs/run.covid-r1.full-text.query-udel.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-full-text-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator -removedups \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator -removedups \
-bm25 -output runs/run.covid-r1.full-text.query-covid19.bm25.txt
```

Expand All @@ -642,27 +642,27 @@ Here are the commands to generate the runs on the paragraph index:

```bash
target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.query.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield question \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.query+question.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query+question+narrative \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.query+question+narrative.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1-udel.xml -topicfield query \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.query-udel.bm25.txt

target/appassembler/bin/SearchCollection -index indexes/lucene-index-covid-paragraph-2020-04-10 \
-topicreader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator \
-topicReader Covid -topics tools/topics-and-qrels/topics.covid-round1.xml -topicfield query -querygenerator Covid19QueryGenerator \
-selectMaxPassage -bm25 -output runs/run.covid-r1.paragraph.query-covid19.bm25.txt
```

Expand Down
2 changes: 1 addition & 1 deletion docs/experiments-doc2query.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ sh target/appassembler/bin/IndexCollection -collection JsonCollection \
And perform retrieval on the test queries:

```
sh target/appassembler/bin/SearchCollection -topicreader Car \
sh target/appassembler/bin/SearchCollection -topicReader Car \
-index indexes/trec_car/lucene-index.car17v2.0-expanded-topk10 \
-topics tools/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-output runs/run.car17v2.0.bm25.expanded-topk10.txt -bm25
Expand Down
4 changes: 2 additions & 2 deletions docs/experiments-fever.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ We can now perform a retrieval run:
```bash
sh target/appassembler/bin/SearchCollection \
-index indexes/fever/lucene-index-fever-paragraph \
-topicreader TsvInt -topics collections/fever/queries.paragraph.dev.tsv \
-topicReader TsvInt -topics collections/fever/queries.paragraph.dev.tsv \
-output runs/run.fever-paragraph.dev.txt -bm25
```

Expand Down Expand Up @@ -166,7 +166,7 @@ From the grid search, we observe that the parameters `k1=0.9`, `b=0.1` perform f
```bash
sh target/appassembler/bin/SearchCollection \
-index indexes/fever/lucene-index-fever-paragraph \
-topicreader TsvInt -topics collections/fever/queries.paragraph.dev.tsv \
-topicReader TsvInt -topics collections/fever/queries.paragraph.dev.tsv \
-output runs/run.fever-paragraph-0.9-0.1.dev.txt -bm25 -bm25.k1 0.9 -bm25.b 0.1
```

Expand Down
12 changes: 6 additions & 6 deletions docs/experiments-msmarco-doc-leaderboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,14 @@ Run with **per-passage** configuration, BM25 default parameters:
```bash
mkdir runs/bm25pbase/

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-per-passage-20201204-f50dcc.797367406a7542b649cefa6b41cf4c33/ \
-output runs/bm25pbase/dev.trec.txt \
-bm25 -bm25.k1 0.9 -bm25.b 0.4 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &

python tools/scripts/msmarco/convert_trec_to_msmarco_run.py --input runs/bm25pbase/dev.trec.txt --output runs/bm25pbase/dev.txt

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-per-passage-20201204-f50dcc.797367406a7542b649cefa6b41cf4c33/ \
-output runs/bm25pbase/eval.trec.txt \
-bm25 -bm25.k1 0.9 -bm25.b 0.4 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &
Expand All @@ -118,14 +118,14 @@ Run with **per-passage** configuration, BM25 tuned parameters, optimized for rec
```bash
mkdir runs/bm25ptuned/

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-per-passage-20201204-f50dcc.797367406a7542b649cefa6b41cf4c33/ \
-output runs/bm25ptuned/dev.trec.txt \
-bm25 -bm25.k1 2.16 -bm25.b 0.61 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &

python tools/scripts/msmarco/convert_trec_to_msmarco_run.py --input runs/bm25ptuned/dev.trec.txt --output runs/bm25ptuned/dev.txt

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-per-passage-20201204-f50dcc.797367406a7542b649cefa6b41cf4c33/ \
-output runs/bm25ptuned/eval.trec.txt \
-bm25 -bm25.k1 2.16 -bm25.b 0.61 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &
Expand Down Expand Up @@ -193,14 +193,14 @@ Anserini's BM25 + doc2query-T5 expansion (per passage), parameters tuned for rec
```bash
mkdir runs/doc2query-t5-per-passage/

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-expanded-per-passage-20201126-1b4d0a.54ea30c64515edf3c3741291b785be53 \
-output runs/doc2query-t5-per-passage/dev.trec.txt \
-bm25 -bm25.k1 2.56 -bm25.b 0.59 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &

python tools/scripts/msmarco/convert_trec_to_msmarco_run.py --input runs/doc2query-t5-per-passage/dev.trec.txt --output runs/doc2query-t5-per-passage/dev.txt

target/appassembler/bin/SearchCollection -topicreader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
target/appassembler/bin/SearchCollection -topicReader TsvString -topics tools/topics-and-qrels/topics.msmarco-doc.test.txt \
-index ~/.cache/pyserini/indexes/index-msmarco-doc-expanded-per-passage-20201126-1b4d0a.54ea30c64515edf3c3741291b785be53 \
-output runs/doc2query-t5-per-passage/eval.trec.txt \
-bm25 -bm25.k1 2.56 -bm25.b 0.59 -hits 1000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 100 &
Expand Down
10 changes: 5 additions & 5 deletions docs/experiments-msmarco-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The dev queries are already stored in our repo:
target/appassembler/bin/SearchCollection \
-index indexes/msmarco-doc/lucene-index-msmarco \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-output runs/run.msmarco-doc.dev.bm25.txt \
-parallelism 4 \
-bm25 -hits 1000
Expand Down Expand Up @@ -101,7 +101,7 @@ A few minor details to pay attention to: the official metric is MRR@100, so we w
target/appassembler/bin/SearchCollection \
-index indexes/msmarco-doc/lucene-index-msmarco \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-output runs/run.msmarco-doc.leaderboard-dev.bm25base.txt -format msmarco \
-parallelism 4 \
-bm25 -bm25.k1 0.9 -bm25.b 0.4 -hits 100
Expand All @@ -128,7 +128,7 @@ Here's the invocation for BM25 with parameters optimized for recall@100 (`k1=4.4
target/appassembler/bin/SearchCollection \
-index indexes/msmarco-doc/lucene-index-msmarco \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-output runs/run.msmarco-doc.leaderboard-dev.bm25tuned.txt -format msmarco \
-parallelism 4 \
-bm25 -bm25.k1 4.46 -bm25.b 0.82 -hits 100
Expand Down Expand Up @@ -181,7 +181,7 @@ So, we need to use different search programs, for example:
$ target/appassembler/bin/SearchCollection \
-index indexes/msmarco-doc/lucene-index-msmarco \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-output runs/run.msmarco-doc.dev.opt-mrr.txt \
-parallelism 4 \
-bm25 -bm25.k1 3.8 -bm25.b 0.87 -hits 1000
Expand All @@ -194,7 +194,7 @@ recall_1000 all 0.9326
$ target/appassembler/bin/SearchCollection \
-index indexes/msmarco-doc/lucene-index-msmarco \
-topics tools/topics-and-qrels/topics.msmarco-doc.dev.txt \
-topicreader TsvInt \
-topicReader TsvInt \
-output runs/run.msmarco-doc.leaderboard-dev.opt-mrr.txt -format msmarco \
-parallelism 4 \
-bm25 -bm25.k1 3.8 -bm25.b 0.87 -hits 100
Expand Down
2 changes: 1 addition & 1 deletion docs/experiments-msmarco-passage-openai-ada2.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ After indexing has completed, you should be able to perform retrieval as follows
target/appassembler/bin/SearchHnswDenseVectors \
-index indexes/lucene-hnsw.msmarco-passage-openai-ada2/ \
-topics tools/topics-and-qrels/topics.{SETTING}.jsonl.gz \
-topicreader JsonIntVector \
-topicReader JsonIntVector \
-output runs/run.{SETTING}.txt \
-querygenerator VectorQueryGenerator -topicfield vector -threads 16 -hits 1000 -efSearch 1000 &
```
Expand Down
Loading

0 comments on commit 227d93a

Please sign in to comment.