Skip to content

Commit

Permalink
Replicability -> Reproducibility (#1532)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Apr 30, 2021
1 parent c83ae47 commit 71f3ca6
Show file tree
Hide file tree
Showing 97 changed files with 360 additions and 342 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Anserini
[![doi](http://img.shields.io/badge/doi-10.1145%2F3239571-blue.svg?style=flat)](https://doi.org/10.1145/3239571)

Anserini is an open-source information retrieval toolkit in Java built on Lucene that aims to bridge the gap between academic information retrieval research and the practice of building real-world search applications.
Among other goals, our effort aims to be [the opposite of this](http://phdcomics.com/comics/archive.php?comicid=1689).
Among other goals, our effort aims to be [the opposite of this](http://phdcomics.com/comics/archive.php?comicid=1689).[*](docs/reproducibility.md)
Anserini grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016).
See [Yang et al. (SIGIR 2017)](https://dl.acm.org/authorize?N47337) and [Yang et al. (JDIQ 2018)](https://dl.acm.org/citation.cfm?doid=3289400.3239571) for overviews.

Expand Down Expand Up @@ -41,7 +41,7 @@ With that, you should be ready to go!
## Regression Experiments

Anserini is designed to support experiments on various standard IR test collections out of the box.
The following experiments are backed by [rigorous end-to-end regression tests](docs/regressions.md) with [`run_regression.py`](src/main/python/run_regression.py) and [the Anserini replicability promise](docs/regressions.md).
The following experiments are backed by [rigorous end-to-end regression tests](docs/regressions.md) with [`run_regression.py`](src/main/python/run_regression.py) and [the Anserini reproducibility promise](docs/regressions.md).
For the most part, these runs are based on [_default_ parameter settings](https://github.com/castorini/Anserini/blob/master/src/main/java/io/anserini/search/SearchArgs.java).

+ [Regressions for Disks 1 & 2](docs/regressions-disk12.md)
Expand Down Expand Up @@ -82,10 +82,10 @@ For the most part, these runs are based on [_default_ parameter settings](https:
+ [Regressions for FIRE 2012 Monolingual Hindi](docs/regressions-fire12-hi.md)
+ [Regressions for FIRE 2012 Monolingual English](docs/regressions-fire12-en.md)

## Replication Guides
## Reproduction Guides

The experiments described below are not associated with rigorous end-to-end regression testing and thus provide a lower standard of replicability.
For the most part, manual copying and pasting of commands into a shell is required to replicate our results.
The experiments described below are not associated with rigorous end-to-end regression testing and thus provide a lower standard of reproducibility.
For the most part, manual copying and pasting of commands into a shell is required to reproduce our results.

### TREC-COVID and CORD-19

Expand All @@ -98,15 +98,15 @@ For the most part, manual copying and pasting of commands into a shell is requir

+ [Guide to BM25 baselines for the MS MARCO Passage Ranking Task](docs/experiments-msmarco-passage.md)
+ [Guide to BM25 baselines for the MS MARCO Document Ranking Task](docs/experiments-msmarco-doc.md)
+ [Guide to replicating baselines MS MARCO Document Ranking Leaderboard](docs/experiments-msmarco-doc-leaderboard.md)
+ [Guide to replicating doc2query results](docs/experiments-doc2query.md) (MS MARCO passage ranking and TREC-CAR)
+ [Guide to replicating docTTTTTquery results](docs/experiments-docTTTTTquery.md) (MS MARCO passage and document ranking)
+ [Guide to reproducing baselines MS MARCO Document Ranking Leaderboard](docs/experiments-msmarco-doc-leaderboard.md)
+ [Guide to reproducing doc2query results](docs/experiments-doc2query.md) (MS MARCO passage ranking and TREC-CAR)
+ [Guide to reproducing docTTTTTquery results](docs/experiments-docTTTTTquery.md) (MS MARCO passage and document ranking)

### Other Experiments

+ [Guide to BM25 baselines for the FEVER Fact Verification Task](docs/experiments-fever.md)
+ [Working with the 20 Newsgroups Dataset](docs/experiments-20newsgroups.md)
+ [Replicating "Neural Hype" Experiments](docs/experiments-forum2018.md)
+ [Guide to BM25 baselines for the FEVER Fact Verification Task](docs/experiments-fever.md)
+ [Guide to reproducing "Neural Hype" Experiments](docs/experiments-forum2018.md)
+ [Guide to running experiments on the AI2 Open Research Corpus](docs/experiments-openresearch.md)
+ [Experiments from Yang et al. (JDIQ 2018)](docs/experiments-jdiq2018.md)
+ Runbooks for TREC 2018: [[Anserini group](docs/runbook-trec2018-anserini.md)] [[h2oloo group](docs/runbook-trec2018-h2oloo.md)]
Expand All @@ -123,8 +123,8 @@ For the most part, manual copying and pasting of commands into a shell is requir
## How Can I Contribute?

If you've found Anserini to be helpful, we have a simple request for you to contribute back.
In the course of replicating baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the Robust04 page](docs/regressions-robust04.md).
Replicability is important to us, and we'd like to know about successes as well as failures.
In the course of [reproducing](docs/reproducibility.md) baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the Robust04 page](docs/regressions-robust04.md).
Reproducibility is important to us, and we'd like to know about successes as well as failures.
Since the regression documentation is auto-generated, pull requests should be sent against the [raw templates](https://github.com/castorini/anserini/tree/master/src/main/resources/docgen/templates).
In turn, you'll be recognized as a [contributor](https://github.com/castorini/anserini/graphs/contributors).

Expand Down Expand Up @@ -161,7 +161,7 @@ Maven 3.3+ is also required.
+ Anserini was upgraded to Lucene 8.0 as of commit [`75e36f9`](https://github.com/castorini/anserini/commit/75e36f97f7037d1ceb20fa9c91582eac5e974131) (6/12/2019); prior to that, the toolkit uses Lucene 7.6.
Based on [preliminary experiments](docs/lucene7-vs-lucene8.md), query evaluation latency has been much improved in Lucene 8.
As a result of this upgrade, results of all regressions have changed slightly.
To replicate old results from Lucene 7.6, use [v0.5.1](https://github.com/castorini/anserini/releases).
To reproducible old results from Lucene 7.6, use [v0.5.1](https://github.com/castorini/anserini/releases).

## References

Expand Down
32 changes: 16 additions & 16 deletions docs/elastirini.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ If you want to install Kibana, it's just another distribution to unpack and a si
## Indexing and Retrieval: Robust04

Once we have a local instance of Elasticsearch up and running, we can index using Elasticsearch through Elastirini.
In this example, we replicate experiments on [Robust04](regressions-robust04.md).
In this example, we reproduce experiments on [Robust04](regressions-robust04.md).

First, let's create the index in Elasticsearch.
We define the schema and the ranking function (BM25) using [this config](../src/main/resources/elasticsearch/index-config.robust04.json):
Expand All @@ -44,7 +44,7 @@ sh target/appassembler/bin/IndexCollection -collection TrecCollection -generator
```

We may need to wait a few minutes after indexing for the index to "catch up" before performing retrieval, otherwise the evaluation metrics may be off.
Run the following command to replicate Anserini BM25 retrieval:
Run the following command to reproduce Anserini BM25 retrieval:

```bash
sh target/appassembler/bin/SearchElastic -topicreader Trec -es.index robust04 \
Expand All @@ -62,7 +62,7 @@ P_30 all 0.3102

## Indexing and Retrieval: Core18

We can replicate the [TREC Washington Post Corpus](regressions-core18.md) results in a similar way.
We can reproduce the [TREC Washington Post Corpus](regressions-core18.md) results in a similar way.
First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.core18.json):

```bash
Expand Down Expand Up @@ -97,7 +97,7 @@ P_30 all 0.3567

## Indexing and Retrieval: MS MARCO Passage

We can replicate the [BM25 Baselines on MS MARCO (Passage)](experiments-msmarco-passage.md) results in a similar way.
We can reproduce the [BM25 Baselines on MS MARCO (Passage)](experiments-msmarco-passage.md) results in a similar way.
First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-passage.json):

```bash
Expand Down Expand Up @@ -131,7 +131,7 @@ recall_1000 all 0.8573

## Indexing and Retrieval: MS MARCO Document

We can replicate the [BM25 Baselines on MS MARCO (Doc)](experiments-msmarco-doc.md) results in a similar way.
We can reproduce the [BM25 Baselines on MS MARCO (Doc)](experiments-msmarco-doc.md) results in a similar way.
First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-doc.json):

```bash
Expand Down Expand Up @@ -189,15 +189,15 @@ python src/main/python/run_es_regression.py --regression [collection] --input [d

For the `collection` meta-parameter, use `robust04`, `core18`, `msmarco-passage`, or `msmarco-doc`, for each of the collections above, respectively.

## Replication Log

+ Results replicated by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
+ Results replicated by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
+ Results replicated by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for [MS Marco Passage](regressions-msmarco-passage.md), [Robust04](regressions-robust04.md) and [Core18](regressions-core18.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py)
+ Results replicated by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for [MS Marco Passage](regressions-msmarco-passage.md)
+ Results replicated by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for [MS MARCO Passage](regressions-msmarco-passage.md), [MS MARCO Document](experiments-msmarco-doc.md)
+ Results replicated by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage](regressions-msmarco-passage.md)
+ Results replicated by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04](regressions-robust04.md), [Core18](regressions-core18.md), and [MS MARCO Passage](regressions-msmarco-passage.md)
+ Results replicated by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections
+ Results replicated by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md)
## Reproduction Log[*](reproducibility.md)

+ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
+ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
+ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for [MS Marco Passage](regressions-msmarco-passage.md), [Robust04](regressions-robust04.md) and [Core18](regressions-core18.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py)
+ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for [MS Marco Passage](regressions-msmarco-passage.md)
+ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for [MS MARCO Passage](regressions-msmarco-passage.md), [MS MARCO Document](experiments-msmarco-doc.md)
+ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage](regressions-msmarco-passage.md)
+ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04](regressions-robust04.md), [Core18](regressions-core18.md), and [MS MARCO Passage](regressions-msmarco-passage.md)
+ Results reproduced by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections
+ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md)

5 changes: 3 additions & 2 deletions docs/experiments-20newsgroups.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ For reference, the number of docs indexed should be exactly as follows:

For convenience, we also provide pre-built indexes above.

## Replication Log
+ Results replicated by [@stephaniewhoo](http://github.com/stephaniewhoo) on 2020-11-24 (commit [`b7f1f08`](https://github.com/castorini/anserini/commit/b7f1f08689014159c1d5b2c9b9905b363af1cbbf))
## Reproduction Log[*](reproducibility.md)

+ Results reproduced by [@stephaniewhoo](http://github.com/stephaniewhoo) on 2020-11-24 (commit [`b7f1f08`](https://github.com/castorini/anserini/commit/b7f1f08689014159c1d5b2c9b9905b363af1cbbf))

10 changes: 5 additions & 5 deletions docs/experiments-cord19-extras.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,9 @@ Set the index pattern to `cord19`, and use `publish_time` as the time filter.
Then navigate to "Discover" in Kibana to run a search.
If you're not getting any results, be sure you've expanded the date range, next to the search bar.

## Replication Log
## Reproduction Log[*](reproducibility.md)

+ Replicated by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) on CORD-19 release of 2020/05/26.
+ Replicated by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) on CORD-19 release of 2020/06/19.
+ Replicated by [@LizzyZhang-tutu](https://github.com/LizzyZhang-tutu) on 2020-07-26 (commit [`fad12be`](https://github.com/castorini/anserini/commit/539f7d43a0183454a633f34aa20b46d2eeec1a19)) on CORD-19 release of 2020/07/25.
+ Replicated by [@lintool](https://github.com/lintool) on 2020-11-23 (commit [`746447`](https://github.com/castorini/anserini/commit/746447af47db5bb032eb551623c11219467c961e)) on CORD-19 release of 2020/07/16 with Solr v8.3.0 and ES/Kibana v7.10.0.
+ Reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) on CORD-19 release of 2020/05/26.
+ Reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) on CORD-19 release of 2020/06/19.
+ Reproduced by [@LizzyZhang-tutu](https://github.com/LizzyZhang-tutu) on 2020-07-26 (commit [`fad12be`](https://github.com/castorini/anserini/commit/539f7d43a0183454a633f34aa20b46d2eeec1a19)) on CORD-19 release of 2020/07/25.
+ Reproduced by [@lintool](https://github.com/lintool) on 2020-11-23 (commit [`746447`](https://github.com/castorini/anserini/commit/746447af47db5bb032eb551623c11219467c961e)) on CORD-19 release of 2020/07/16 with Solr v8.3.0 and ES/Kibana v7.10.0.
8 changes: 4 additions & 4 deletions docs/experiments-covid-doc2query.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ As an alternative to downloading each run separately, clone the repo and you'll

## Round 5

These are runs that can be easily replicated with Anserini, from pre-built doc2query expanded CORD-19 indexes we have provided (version from 2020/07/16, the official corpus used in round 5).
These are runs that can be easily reproduced with Anserini, from pre-built doc2query expanded CORD-19 indexes we have provided (version from 2020/07/16, the official corpus used in round 5).
They were prepared _for_ round 5 (for participants who wish to have a baseline run to rerank); to provide a sense of effectiveness, we present evaluation results with the cumulative qrels from rounds 1, 2, 3, and 4 ([`qrels_covid_d4_j0.5-4.txt`](https://ir.nist.gov/covidSubmit/data/qrels-covid_d4_j0.5-4.txt) provided by NIST, stored in our repo as [`qrels.covid-round4-cumulative.txt`](../src/main/resources/topics-and-qrels/qrels.covid-round4-cumulative.txt)).

| | index | field(s) | nDCG@10 | J@10 | R@1k | run file | checksum |
Expand Down Expand Up @@ -50,7 +50,7 @@ The final runs after removing judgments from 1, 2, 3, and 4 (cumulatively), are
| `r5.fusion2` = Row 8 | [[download](https://www.dropbox.com/s/j1qdqr88cbsybae/expanded.anserini.final-r5.fusion2.txt?dl=1)] | `a65fabe7b5b7bc4216be632296269ce6` |
| `r5.rf` = Row 9 | [[download](https://www.dropbox.com/s/5bm4pdngh5bx3px/expanded.anserini.final-r5.rf.txt?dl=1)] | `24f0b75a25273b7b00d3e65065e98147` |

We have written scripts that automate the replication of these baselines:
We have written scripts that automate the reproduction of these baselines:

```
$ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-07-16
Expand Down Expand Up @@ -101,7 +101,7 @@ This qrels file, provided by NIST as [`qrels-covid_d5_j0.5-5.txt`](https://ir.ni
| 8 | - | reciprocal rank fusion(2, 4, 6) | 0.7131 | 1.0000 | 0.6755 | 0.9910 | 0.3036 | 0.5166 | 0.4518
| 9 | abstract | UDel qgen + RF | 0.8160 | 1.0000 | 0.7787 | 0.9960 | 0.3421 | 0.5249 | 0.4107

Note that all of the results above can be replicated with the following script:
Note that all of the results above can be reproduced with the following script:

```bash
$ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-07-16
Expand Down Expand Up @@ -136,7 +136,7 @@ The final runs, after removing judgments from 1, 2, and 3 (cumulatively), are as
| `r4.fusion2` = Row 8 | [[download](https://www.dropbox.com/s/5epunmkexqtupe6/expanded.anserini.final-r4.fusion2.txt?dl=1)] | `590400c12b72ce8ed3b5af2f4c45f039` |
| `r4.rf` = Row 9 | [[download](https://www.dropbox.com/s/kqbu3cui214ijyh/expanded.anserini.final-r4.rf.txt?dl=1)] | `b9e7bb80fd8dc97f93908d895fb07f7f` |

We have written scripts that automate the replication of these baselines:
We have written scripts that automate the reproduction of these baselines:

```
$ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-06-19
Expand Down
Loading

0 comments on commit 71f3ca6

Please sign in to comment.