Skip to content

Commit

Permalink
Release notes for v0.36.0 (#2475)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Apr 29, 2024
1 parent cc45081 commit 3229fcd
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 13 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ Anserini is packaged in a self-contained fatjar, which also provides the simples
Assuming you've already got Java installed, fetch the fatjar:

```bash
wget https://repo1.maven.org/maven2/io/anserini/anserini/0.35.1/anserini-0.35.1-fatjar.jar
wget https://repo1.maven.org/maven2/io/anserini/anserini/0.36.0/anserini-0.36.0-fatjar.jar
```

The follow commands will generate a SPLADE++ ED run with the dev queries (encoded using ONNX) on the MS MARCO passage corpus:

```bash
java -cp anserini-0.35.1-fatjar.jar io.anserini.search.SearchCollection \
java -cp anserini-0.36.0-fatjar.jar io.anserini.search.SearchCollection \
-index msmarco-v1-passage.splade-pp-ed \
-topics msmarco-v1-passage.dev \
-encoder SpladePlusPlusEnsembleDistil \
Expand All @@ -39,16 +39,17 @@ java -cp anserini-0.35.1-fatjar.jar io.anserini.search.SearchCollection \
To evaluate:

```bash
wget https://raw.githubusercontent.com/castorini/anserini-tools/master/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt
java -cp anserini-0.35.1-fatjar.jar trec_eval -c -M 10 -m recip_rank qrels.msmarco-passage.dev-subset.txt run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt
java -cp anserini-0.36.0-fatjar.jar trec_eval -c -M 10 -m recip_rank msmarco-passage.dev-subset run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt
```

See [detailed instructions](docs/fatjar-regressions/fatjar-regressions-v0.35.1.md) for the current fatjar release of Anserini (v0.35.1) to reproduce regression experiments on the MS MARCO V2.1 corpora for TREC 2024 RAG, on MS MARCO V1 Passage, and on BEIR, all directly from the fatjar!
We also have [forthcoming instructions](docs/fatjar-regressions/fatjar-regressions-v0.35.2-SNAPSHOT.md) for the next release (v0.35.2-SNAPSHOT) if you're interested.
See [detailed instructions](docs/fatjar-regressions/fatjar-regressions-v0.36.0.md) for the current fatjar release of Anserini (v0.36.0) to reproduce regression experiments on the MS MARCO V2.1 corpora for TREC 2024 RAG, on MS MARCO V1 Passage, and on BEIR, all directly from the fatjar!

<!-- We also have [forthcoming instructions](docs/fatjar-regressions/fatjar-regressions-v0.36.1-SNAPSHOT.md) for the next release (v0.36.1-SNAPSHOT) if you're interested. -->

<details>
<summary>Older instructions</summary>

+ [Anserini v0.35.1](docs/fatjar-regressions/fatjar-regressions-v0.35.1.md)
+ [Anserini v0.35.0](docs/fatjar-regressions/fatjar-regressions-v0.35.0.md)

</details>
Expand Down Expand Up @@ -447,6 +448,7 @@ Beyond that, there are always [open issues](https://github.com/castorini/anserin

## 📜️ Release History

+ v0.36.0: April 28, 2024 [[Release Notes](docs/release-notes/release-notes-v0.36.0.md)]
+ v0.35.1: April 24, 2024 [[Release Notes](docs/release-notes/release-notes-v0.35.1.md)]
+ v0.35.0: April 3, 2024 [[Release Notes](docs/release-notes/release-notes-v0.35.0.md)]
+ v0.25.0: March 27, 2024 [[Release Notes](docs/release-notes/release-notes-v0.25.0.md)]
Expand Down
3 changes: 3 additions & 0 deletions docs/fatjar-regressions/fatjar-regressions-v0.35.0.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Anserini Fatjar Regresions (v0.35.0)

❗Anserini v0.35.0 is no longer the latest release.
The latest release is always linked from the main [Anserini](http://anserini.io/) site.

Fetch the fatjar:

```bash
Expand Down
5 changes: 5 additions & 0 deletions docs/fatjar-regressions/fatjar-regressions-v0.35.1.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Anserini Fatjar Regresions (v0.35.1)

❗Anserini v0.35.1 is no longer the latest release.
The latest release is always linked from the main [Anserini](http://anserini.io/) site.

❗The published artifacts for Anserini v0.35.1 are problematic. See [Anserini #2468](https://github.com/castorini/anserini/pull/2468) for details.

Fetch the fatjar:

```bash
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# Anserini Fatjar Regresions (v0.35.2-SNAPSHOT)
# Anserini Fatjar Regresions (v0.36.0)

Fetch the fatjar:

```bash
# Change when we publish new artifact.
wget https://repo1.maven.org/maven2/io/anserini/anserini/0.35.1/anserini-0.35.1-fatjar.jar
wget https://repo1.maven.org/maven2/io/anserini/anserini/0.36.0/anserini-0.36.0-fatjar.jar
```

Note that prebuilt indexes will be downloaded to `~/.cache/pyserini/indexes/`.
Expand All @@ -14,8 +13,8 @@ If you want to change the download location, the current workaround is to use sy
Let's start out by setting the `ANSERINI_JAR` and the `OUTPUT_DIR`:

```bash
export ANSERINI_JAR=`ls target/*-fatjar.jar`
export OUTPUT_DIR="runs"
export ANSERINI_JAR="anserini-0.36.0-fatjar.jar"
export OUTPUT_DIR="."
```

## TREC 2024 RAG
Expand All @@ -27,7 +26,7 @@ The `msmarco-v2.1-doc-segmented` prebuilt index is 84 GB uncompresed.
Here are the instructions for reproducing runs on the MS MARCO V2.1 document corpus with prebuilt indexes (adjust number of threads based on available resources):

```bash
TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl); for t in "${TOPICS[@]}"
TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl rag24-raggy-dev); for t in "${TOPICS[@]}"
do
java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.${t}.txt -threads 16 -bm25
done
Expand Down Expand Up @@ -56,6 +55,11 @@ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map dl23-doc-msmarco-v2.1 $OUTPUT_
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.100 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.trec2023-dl.txt
echo ''
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.100 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc.bm25.rag24-raggy-dev.txt
```

And these are the expected scores:
Expand All @@ -81,14 +85,20 @@ recip_rank all 0.5783
ndcg_cut_10 all 0.2914
recall_100 all 0.2604
recall_1000 all 0.5383
map all 0.1251
recip_rank all 0.7060
ndcg_cut_10 all 0.3631
recall_100 all 0.2433
recall_1000 all 0.5317
```

</details>

Here are the instructions for reproducing runs on the MS MARCO V2.1 segmented document corpus with prebuilt indexes (adjust number of threads based on available resources):

```bash
TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl); for t in "${TOPICS[@]}"
TOPICS=(msmarco-v2-doc-dev msmarco-v2-doc-dev2 trec2021-dl trec2022-dl trec2023-dl rag24-raggy-dev); for t in "${TOPICS[@]}"
do
java -cp $ANSERINI_JAR io.anserini.search.SearchCollection -index msmarco-v2.1-doc-segmented -topics $t -output $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.${t}.txt -threads 16 -bm25 -hits 10000 -selectMaxPassage -selectMaxPassage.delimiter "#" -selectMaxPassage.hits 1000
done
Expand Down Expand Up @@ -117,6 +127,11 @@ java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map dl23-doc-msmarco-v2.1 $OUTPUT_
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.100 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 dl23-doc-msmarco-v2.1 $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.trec2023-dl.txt
echo ''
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m map rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -M 100 -m recip_rank -c -m ndcg_cut.10 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.100 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt
java -cp $ANSERINI_JAR trec_eval -c -m recall.1000 rag24.raggy-dev $OUTPUT_DIR/run.msmarco-v2.1-doc-segmented.bm25.rag24-raggy-dev.txt
```

And these are the expected scores:
Expand All @@ -142,6 +157,12 @@ recip_rank all 0.6519
ndcg_cut_10 all 0.3356
recall_100 all 0.3049
recall_1000 all 0.5852
map all 0.1561
recip_rank all 0.7465
ndcg_cut_10 all 0.4227
recall_100 all 0.2807
recall_1000 all 0.5745
```

</details>
Expand Down
71 changes: 71 additions & 0 deletions docs/release-notes/release-notes-v0.36.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Anserini Release Notes (v0.36.0)

+ **Release date:** April 28, 2024
+ **Lucene version:** Lucene 9.9.1

## Summary of Changes

+ Refactored and cleaned-up POM.
+ Added bindings for TREC 2024 RAG Track "RAGgy" topics.
+ Added regressions for MS MARCO V2.1 corpora: document + segmented document
+ Added ability to read YAML configs from fatjar.
+ Added ability to download qrels for `trec_eval` automatically based on symbol bindings.

## Contributors (This Release)

Sorted by number of commits:

+ Jimmy Lin ([lintool](https://github.com/lintool))
+ Daniel Kohn ([DanielKohn1208](https://github.com/DanielKohn1208))
+ Eric Zhang ([16BitNarwhal](https://github.com/16BitNarwhal))

## All Contributors

All contributors with five or more commits, sorted by number of commits, [according to GitHub](https://github.com/castorini/Anserini/graphs/contributors):

+ Jimmy Lin ([lintool](https://github.com/lintool))
+ Peilin Yang ([Peilin-Yang](https://github.com/Peilin-Yang))
+ Ogundepo Odunayo ([ToluClassics](https://github.com/ToluClassics))
+ Arthur Chen ([ArthurChen189](https://github.com/ArthurChen189))
+ Xueguang Ma ([MXueguang](https://github.com/MXueguang))
+ Ahmet Arslan ([iorixxx](https://github.com/iorixxx))
+ Tommaso Teofili ([tteofili](https://github.com/tteofili))
+ Edwin Zhang ([edwinzhng](https://github.com/edwinzhng))
+ Rodrigo Nogueira ([rodrigonogueira4](https://github.com/rodrigonogueira4))
+ Jheng-Hong Yang ([justram](https://github.com/justram))
+ Royal Sequiera ([rosequ](https://github.com/rosequ))
+ Emily Wang ([emmileaf](https://github.com/emmileaf))
+ Yuqi Liu ([yuki617](https://github.com/yuki617))
+ Chris Kamphuis ([Chriskamphuis](https://github.com/Chriskamphuis))
+ Victor Yang ([Victor0118](https://github.com/Victor0118))
+ Boris Lin ([borislin](https://github.com/borislin))
+ Nikhil Gupta ([nikhilro](https://github.com/nikhilro))
+ Jasper Xian ([jasper-xian](https://github.com/jasper-xian))
+ Ronak Pradeep ([ronakice](https://github.com/ronakice))
+ Stephanie Hu ([stephaniewhoo](https://github.com/stephaniewhoo))
+ Yuhao Xie ([Kytabyte](https://github.com/Kytabyte))
+ Shane Ding ([shaneding](https://github.com/shaneding))
+ Kuang Lu ([lukuang](https://github.com/lukuang))
+ Mofe Adeyemi ([Mofetoluwa](https://github.com/Mofetoluwa))
+ Joel Mackenzie ([JMMackenzie](https://github.com/JMMackenzie))
+ Xinyu (Crystina) Zhang ([crystina-z](https://github.com/crystina-z))
+ Adam Yang ([adamyy](https://github.com/adamyy))
+ Salman Mohammed ([salman1993](https://github.com/salman1993))
+ Xinyu Mavis Liu ([x389liu](https://github.com/x389liu))
+ Eric Zhang ([16BitNarwhal](https://github.com/16BitNarwhal))
+ Luchen Tan ([LuchenTan](https://github.com/LuchenTan))
+ Manveer Tamber ([manveertamber](https://github.com/manveertamber))
+ Kelvin Jiang ([kelvin-jiang](https://github.com/kelvin-jiang))
+ Hang Cui ([HangCui0510](https://github.com/HangCui0510))
+ Matt Yang ([d1shs0ap](https://github.com/d1shs0ap))
+ Zhiying Jiang ([bazingagin](https://github.com/bazingagin))
+ Johnson Han ([x65han](https://github.com/x65han))
+ Akintunde Oladipo ([theyorubayesian](https://github.com/theyorubayesian))
+ Michael Tu ([tuzhucheng](https://github.com/tuzhucheng))
+ Aileen Lin ([AileenLin](https://github.com/AileenLin))
+ Dayang Shi ([dyshi](https://github.com/dyshi))
+ Yuqing Xie ([amyxie361](https://github.com/amyxie361))
+ Nandan Thakur ([thakur-nandan](https://github.com/thakur-nandan))
+ Peng Shi ([Impavidity](https://github.com/Impavidity))
+ Zeynep Akkalyoncu Yilmaz ([zeynepakkalyoncu](https://github.com/zeynepakkalyoncu))
+ Ryan Clancy ([ryan-clancy](https://github.com/ryan-clancy))

0 comments on commit 3229fcd

Please sign in to comment.