Replicability -> Reproducibility (#1532)

castorini · Apr 30, 2021 · 71f3ca6 · 71f3ca6
1 parent c83ae47
commit 71f3ca6
Show file tree

Hide file tree

Showing 97 changed files with 360 additions and 342 deletions.
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ Anserini
 [![doi](http://img.shields.io/badge/doi-10.1145%2F3239571-blue.svg?style=flat)](https://doi.org/10.1145/3239571)
 
 Anserini is an open-source information retrieval toolkit in Java built on Lucene that aims to bridge the gap between academic information retrieval research and the practice of building real-world search applications.
-Among other goals, our effort aims to be [the opposite of this](http://phdcomics.com/comics/archive.php?comicid=1689).
+Among other goals, our effort aims to be [the opposite of this](http://phdcomics.com/comics/archive.php?comicid=1689).[*](docs/reproducibility.md)
 Anserini grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016). 
 See [Yang et al. (SIGIR 2017)](https://dl.acm.org/authorize?N47337) and [Yang et al. (JDIQ 2018)](https://dl.acm.org/citation.cfm?doid=3289400.3239571) for overviews.
 
@@ -41,7 +41,7 @@ With that, you should be ready to go!
 ## Regression Experiments
 
 Anserini is designed to support experiments on various standard IR test collections out of the box.
-The following experiments are backed by [rigorous end-to-end regression tests](docs/regressions.md) with [`run_regression.py`](src/main/python/run_regression.py) and [the Anserini replicability promise](docs/regressions.md).
+The following experiments are backed by [rigorous end-to-end regression tests](docs/regressions.md) with [`run_regression.py`](src/main/python/run_regression.py) and [the Anserini reproducibility promise](docs/regressions.md).
 For the most part, these runs are based on [_default_ parameter settings](https://github.com/castorini/Anserini/blob/master/src/main/java/io/anserini/search/SearchArgs.java).
 
 + [Regressions for Disks 1 &amp; 2](docs/regressions-disk12.md)
@@ -82,10 +82,10 @@ For the most part, these runs are based on [_default_ parameter settings](https:
 + [Regressions for FIRE 2012 Monolingual Hindi](docs/regressions-fire12-hi.md)
 + [Regressions for FIRE 2012 Monolingual English](docs/regressions-fire12-en.md)
 
-## Replication Guides
+## Reproduction Guides
 
-The experiments described below are not associated with rigorous end-to-end regression testing and thus provide a lower standard of replicability.
-For the most part, manual copying and pasting of commands into a shell is required to replicate our results.
+The experiments described below are not associated with rigorous end-to-end regression testing and thus provide a lower standard of reproducibility.
+For the most part, manual copying and pasting of commands into a shell is required to reproduce our results.
 
 ### TREC-COVID and CORD-19
 
@@ -98,15 +98,15 @@ For the most part, manual copying and pasting of commands into a shell is requir
 
 + [Guide to BM25 baselines for the MS MARCO Passage Ranking Task](docs/experiments-msmarco-passage.md)
 + [Guide to BM25 baselines for the MS MARCO Document Ranking Task](docs/experiments-msmarco-doc.md)
-+ [Guide to replicating baselines MS MARCO Document Ranking Leaderboard](docs/experiments-msmarco-doc-leaderboard.md)
-+ [Guide to replicating doc2query results](docs/experiments-doc2query.md) (MS MARCO passage ranking and TREC-CAR)
-+ [Guide to replicating docTTTTTquery results](docs/experiments-docTTTTTquery.md) (MS MARCO passage and document ranking)
++ [Guide to reproducing baselines MS MARCO Document Ranking Leaderboard](docs/experiments-msmarco-doc-leaderboard.md)
++ [Guide to reproducing doc2query results](docs/experiments-doc2query.md) (MS MARCO passage ranking and TREC-CAR)
++ [Guide to reproducing docTTTTTquery results](docs/experiments-docTTTTTquery.md) (MS MARCO passage and document ranking)
 
 ### Other Experiments
 
-+ [Guide to BM25 baselines for the FEVER Fact Verification Task](docs/experiments-fever.md)
 + [Working with the 20 Newsgroups Dataset](docs/experiments-20newsgroups.md)
-+ [Replicating "Neural Hype" Experiments](docs/experiments-forum2018.md)
++ [Guide to BM25 baselines for the FEVER Fact Verification Task](docs/experiments-fever.md)
++ [Guide to reproducing "Neural Hype" Experiments](docs/experiments-forum2018.md)
 + [Guide to running experiments on the AI2 Open Research Corpus](docs/experiments-openresearch.md)
 + [Experiments from Yang et al. (JDIQ 2018)](docs/experiments-jdiq2018.md)
 + Runbooks for TREC 2018: [[Anserini group](docs/runbook-trec2018-anserini.md)] [[h2oloo group](docs/runbook-trec2018-h2oloo.md)]
@@ -123,8 +123,8 @@ For the most part, manual copying and pasting of commands into a shell is requir
 ## How Can I Contribute?
 
 If you've found Anserini to be helpful, we have a simple request for you to contribute back.
-In the course of replicating baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the Robust04 page](docs/regressions-robust04.md).
-Replicability is important to us, and we'd like to know about successes as well as failures.
+In the course of [reproducing](docs/reproducibility.md) baseline results on standard test collections, please let us know if you're successful by sending us a pull request with a simple note, like what appears at the bottom of [the Robust04 page](docs/regressions-robust04.md).
+Reproducibility is important to us, and we'd like to know about successes as well as failures.
 Since the regression documentation is auto-generated, pull requests should be sent against the [raw templates](https://github.com/castorini/anserini/tree/master/src/main/resources/docgen/templates).
 In turn, you'll be recognized as a [contributor](https://github.com/castorini/anserini/graphs/contributors).
 
@@ -161,7 +161,7 @@ Maven 3.3+ is also required.
 + Anserini was upgraded to Lucene 8.0 as of commit [`75e36f9`](https://github.com/castorini/anserini/commit/75e36f97f7037d1ceb20fa9c91582eac5e974131) (6/12/2019); prior to that, the toolkit uses Lucene 7.6.
 Based on [preliminary experiments](docs/lucene7-vs-lucene8.md), query evaluation latency has been much improved in Lucene 8.
 As a result of this upgrade, results of all regressions have changed slightly.
-To replicate old results from Lucene 7.6, use [v0.5.1](https://github.com/castorini/anserini/releases).
+To reproducible old results from Lucene 7.6, use [v0.5.1](https://github.com/castorini/anserini/releases).
 
 ## References
 

diff --git a/docs/elastirini.md b/docs/elastirini.md
@@ -23,7 +23,7 @@ If you want to install Kibana, it's just another distribution to unpack and a si
 ## Indexing and Retrieval: Robust04
 
 Once we have a local instance of Elasticsearch up and running, we can index using Elasticsearch through Elastirini.
-In this example, we replicate experiments on [Robust04](regressions-robust04.md).
+In this example, we reproduce experiments on [Robust04](regressions-robust04.md).
 
 First, let's create the index in Elasticsearch.
 We define the schema and the ranking function (BM25) using [this config](../src/main/resources/elasticsearch/index-config.robust04.json):
@@ -44,7 +44,7 @@ sh target/appassembler/bin/IndexCollection -collection TrecCollection -generator
 ```
 
 We may need to wait a few minutes after indexing for the index to "catch up" before performing retrieval, otherwise the evaluation metrics may be off.
-Run the following command to replicate Anserini BM25 retrieval:
+Run the following command to reproduce Anserini BM25 retrieval:
 
 ```bash
 sh target/appassembler/bin/SearchElastic -topicreader Trec -es.index robust04 \
@@ -62,7 +62,7 @@ P_30                  	all	0.3102
 
 ## Indexing and Retrieval: Core18
 
-We can replicate the [TREC Washington Post Corpus](regressions-core18.md) results in a similar way.
+We can reproduce the [TREC Washington Post Corpus](regressions-core18.md) results in a similar way.
 First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.core18.json):
 
 ```bash
@@ -97,7 +97,7 @@ P_30                  	all	0.3567
 
 ## Indexing and Retrieval: MS MARCO Passage
 
-We can replicate the [BM25 Baselines on MS MARCO (Passage)](experiments-msmarco-passage.md) results in a similar way.
+We can reproduce the [BM25 Baselines on MS MARCO (Passage)](experiments-msmarco-passage.md) results in a similar way.
 First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-passage.json):
 
 ```bash
@@ -131,7 +131,7 @@ recall_1000           	all	0.8573
 
 ## Indexing and Retrieval: MS MARCO Document
 
-We can replicate the [BM25 Baselines on MS MARCO (Doc)](experiments-msmarco-doc.md) results in a similar way.
+We can reproduce the [BM25 Baselines on MS MARCO (Doc)](experiments-msmarco-doc.md) results in a similar way.
 First, set up the proper schema using [this config](../src/main/resources/elasticsearch/index-config.msmarco-doc.json):
 
 ```bash
@@ -189,15 +189,15 @@ python src/main/python/run_es_regression.py --regression [collection] --input [d
 
 For the `collection` meta-parameter, use `robust04`, `core18`, `msmarco-passage`, or `msmarco-doc`, for each of the collections above, respectively.
 
-## Replication Log
-
-+ Results replicated by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
-+ Results replicated by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
-+ Results replicated by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for [MS Marco Passage](regressions-msmarco-passage.md), [Robust04](regressions-robust04.md) and [Core18](regressions-core18.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py)
-+ Results replicated by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for [MS Marco Passage](regressions-msmarco-passage.md)
-+ Results replicated by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for [MS MARCO Passage](regressions-msmarco-passage.md), [MS MARCO Document](experiments-msmarco-doc.md)
-+ Results replicated by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage](regressions-msmarco-passage.md)
-+ Results replicated by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04](regressions-robust04.md), [Core18](regressions-core18.md), and [MS MARCO Passage](regressions-msmarco-passage.md)
-+ Results replicated by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections
-+ Results replicated by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md)
+## Reproduction Log[*](reproducibility.md)
+
++ Results reproduced by [@nikhilro](https://github.com/nikhilro) on 2020-01-26 (commit [`d5ee069`](https://github.com/castorini/anserini/commit/d5ee069399e6a306d7685bda756c1f19db721156)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
++ Results reproduced by [@edwinzhng](https://github.com/edwinzhng) on 2020-01-26 (commit [`7b76dfb`](https://github.com/castorini/anserini/commit/7b76dfbea7e0c01a3a5dc13e74f54852c780ec9b)) for both [MS MARCO Passage](experiments-msmarco-passage.md) and [Robust04](regressions-robust04.md)
++ Results reproduced by [@HangCui0510](https://github.com/HangCui0510) on 2020-04-29 (commit [`07a9b05`](https://github.com/castorini/anserini/commit/07a9b053173637e15be79de4e7fce4d5a93d04fe)) for [MS Marco Passage](regressions-msmarco-passage.md), [Robust04](regressions-robust04.md) and [Core18](regressions-core18.md) using end-to-end [`run_es_regression`](../src/main/python/run_es_regression.py)
++ Results reproduced by [@shaneding](https://github.com/shaneding) on 2020-05-25 (commit [`1de3274`](https://github.com/castorini/anserini/commit/1de3274b057a63382534c5277ffcd772c3fc0d43)) for [MS Marco Passage](regressions-msmarco-passage.md)
++ Results reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`94893f1`](https://github.com/castorini/anserini/commit/94893f170e047d77c3ef5b8b995d7fbdd13f4298)) for [MS MARCO Passage](regressions-msmarco-passage.md), [MS MARCO Document](experiments-msmarco-doc.md)
++ Results reproduced by [@YimingDou](https://github.com/YimingDou) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) for [MS MARCO Passage](regressions-msmarco-passage.md)
++ Results reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) for [Robust04](regressions-robust04.md), [Core18](regressions-core18.md), and [MS MARCO Passage](regressions-msmarco-passage.md)
++ Results reproduced by [@lintool](https://github.com/lintool) on 2020-11-10 (commit [`e19755`](https://github.com/castorini/anserini/commit/e19755b5fa976127830597bc9fbca203b9f5ad24)), all commands and end-to-end regression script for all four collections
++ Results reproduced by [@jrzhang12](https://github.com/jrzhang12) on 2021-01-02 (commit [`be4e44d`](https://github.com/castorini/anserini/commit/02c52ee606ba0ebe32c130af1e26d24d8f10566a)) for [MS MARCO Passage](regressions-msmarco-passage.md)
 
diff --git a/docs/experiments-20newsgroups.md b/docs/experiments-20newsgroups.md
@@ -79,6 +79,7 @@ For reference, the number of docs indexed should be exactly as follows:
 
 For convenience, we also provide pre-built indexes above.
 
-## Replication Log
-+ Results replicated by [@stephaniewhoo](http://github.com/stephaniewhoo) on 2020-11-24 (commit [`b7f1f08`](https://github.com/castorini/anserini/commit/b7f1f08689014159c1d5b2c9b9905b363af1cbbf))
+## Reproduction Log[*](reproducibility.md)
+
++ Results reproduced by [@stephaniewhoo](http://github.com/stephaniewhoo) on 2020-11-24 (commit [`b7f1f08`](https://github.com/castorini/anserini/commit/b7f1f08689014159c1d5b2c9b9905b363af1cbbf))
 
diff --git a/docs/experiments-cord19-extras.md b/docs/experiments-cord19-extras.md
@@ -120,9 +120,9 @@ Set the index pattern to `cord19`, and use `publish_time` as the time filter.
 Then navigate to "Discover" in Kibana to run a search.
 If you're not getting any results, be sure you've expanded the date range, next to the search bar.
 
-## Replication Log
+## Reproduction Log[*](reproducibility.md)
 
-+ Replicated by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) on CORD-19 release of 2020/05/26.
-+ Replicated by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) on CORD-19 release of 2020/06/19.
-+ Replicated by [@LizzyZhang-tutu](https://github.com/LizzyZhang-tutu) on 2020-07-26 (commit [`fad12be`](https://github.com/castorini/anserini/commit/539f7d43a0183454a633f34aa20b46d2eeec1a19)) on CORD-19 release of 2020/07/25.
-+ Replicated by [@lintool](https://github.com/lintool) on 2020-11-23 (commit [`746447`](https://github.com/castorini/anserini/commit/746447af47db5bb032eb551623c11219467c961e)) on CORD-19 release of 2020/07/16 with Solr v8.3.0 and ES/Kibana v7.10.0.
++ Reproduced by [@adamyy](https://github.com/adamyy) on 2020-05-29 (commit [`2947a16`](https://github.com/castorini/anserini/commit/2947a1622efae35637b83e321aba8e6fccd43489)) on CORD-19 release of 2020/05/26.
++ Reproduced by [@yxzhu16](https://github.com/yxzhu16) on 2020-07-17 (commit [`fad12be`](https://github.com/castorini/anserini/commit/fad12be2e37a075100707c3a674eb67bc0aa57ef)) on CORD-19 release of 2020/06/19.
++ Reproduced by [@LizzyZhang-tutu](https://github.com/LizzyZhang-tutu) on 2020-07-26 (commit [`fad12be`](https://github.com/castorini/anserini/commit/539f7d43a0183454a633f34aa20b46d2eeec1a19)) on CORD-19 release of 2020/07/25.
++ Reproduced by [@lintool](https://github.com/lintool) on 2020-11-23 (commit [`746447`](https://github.com/castorini/anserini/commit/746447af47db5bb032eb551623c11219467c961e)) on CORD-19 release of 2020/07/16 with Solr v8.3.0 and ES/Kibana v7.10.0.
diff --git a/docs/experiments-covid-doc2query.md b/docs/experiments-covid-doc2query.md
@@ -19,7 +19,7 @@ As an alternative to downloading each run separately, clone the repo and you'll
 
 ## Round 5
 
-These are runs that can be easily replicated with Anserini, from pre-built doc2query expanded CORD-19 indexes we have provided (version from 2020/07/16, the official corpus used in round 5).
+These are runs that can be easily reproduced with Anserini, from pre-built doc2query expanded CORD-19 indexes we have provided (version from 2020/07/16, the official corpus used in round 5).
 They were prepared _for_ round 5 (for participants who wish to have a baseline run to rerank); to provide a sense of effectiveness, we present evaluation results with the cumulative qrels from rounds 1, 2, 3, and 4 ([`qrels_covid_d4_j0.5-4.txt`](https://ir.nist.gov/covidSubmit/data/qrels-covid_d4_j0.5-4.txt) provided by NIST, stored in our repo as [`qrels.covid-round4-cumulative.txt`](../src/main/resources/topics-and-qrels/qrels.covid-round4-cumulative.txt)).
 
 |    | index     | field(s)                        | nDCG@10 | J@10 | R@1k | run file | checksum |
@@ -50,7 +50,7 @@ The final runs after removing judgments from 1, 2, 3, and 4 (cumulatively), are
 | `r5.fusion2` = Row 8 | [[download](https://www.dropbox.com/s/j1qdqr88cbsybae/expanded.anserini.final-r5.fusion2.txt?dl=1)] | `a65fabe7b5b7bc4216be632296269ce6` |
 | `r5.rf` = Row 9      | [[download](https://www.dropbox.com/s/5bm4pdngh5bx3px/expanded.anserini.final-r5.rf.txt?dl=1)]      | `24f0b75a25273b7b00d3e65065e98147` |
 
-We have written scripts that automate the replication of these baselines:
+We have written scripts that automate the reproduction of these baselines:
 
 ```
 $ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-07-16
@@ -101,7 +101,7 @@ This qrels file, provided by NIST as [`qrels-covid_d5_j0.5-5.txt`](https://ir.ni
 |  8 | -         | reciprocal rank fusion(2, 4, 6) | 0.7131 | 1.0000 | 0.6755 | 0.9910 | 0.3036 | 0.5166 | 0.4518
 |  9 | abstract  | UDel qgen + RF                  | 0.8160 | 1.0000 | 0.7787 | 0.9960 | 0.3421 | 0.5249 | 0.4107
 
-Note that all of the results above can be replicated with the following script:
+Note that all of the results above can be reproduced with the following script:
 
 ```bash
 $ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-07-16
@@ -136,7 +136,7 @@ The final runs, after removing judgments from 1, 2, and 3 (cumulatively), are as
 | `r4.fusion2` = Row 8 | [[download](https://www.dropbox.com/s/5epunmkexqtupe6/expanded.anserini.final-r4.fusion2.txt?dl=1)] | `590400c12b72ce8ed3b5af2f4c45f039` |
 | `r4.rf` = Row 9      | [[download](https://www.dropbox.com/s/kqbu3cui214ijyh/expanded.anserini.final-r4.rf.txt?dl=1)]      | `b9e7bb80fd8dc97f93908d895fb07f7f` |
 
-We have written scripts that automate the replication of these baselines:
+We have written scripts that automate the reproduction of these baselines:
 
 ```
 $ python src/main/python/trec-covid/download_doc2query_indexes.py --date 2020-06-19