Updates to documentation for JDIQ article (castorini#466)

kevinros · Nov 2, 2018 · d9caf46 · d9caf46
1 parent 3b42268
commit d9caf46
Showing 3 changed files with 40 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ Anserini
 [![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.anserini/anserini/badge.svg)](https://maven-badges.herokuapp.com/maven-central/io.anserini/anserini)
 [![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat-square)](./LICENSE)
 
-Anserini is an open-source information retrieval toolkit built on Lucene that aims to bridge the gap between academic information retrieval research and the practice of building real-world search applications. This effort grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016) and the initial vision of our system is described [in a short paper](https://dl.acm.org/authorize?N47337) (Yang et al., SIGIR 2017).
+Anserini is an open-source information retrieval toolkit built on Lucene that aims to bridge the gap between academic information retrieval research and the practice of building real-world search applications. This effort grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016). Additional details can be found [in a short paper](https://dl.acm.org/authorize?N47337) (Yang et al., SIGIR 2017) and a [journal article](https://dl.acm.org/citation.cfm?doid=3289400.3239571) (Yang et al., JDIQ 2018).
 
 ## Getting Started
 
@@ -32,7 +32,8 @@ cd ndeval && make
 
 ## Running Standard IR Experiments
 
-Anserini is designed to support experiments on various standard TREC collections out of the box:
+Anserini is designed to support experiments on various standard TREC collections out of the box.
+Each collection is associated with [regression tests](docs/regressions.md) for replicability.
 
 + [Experiments on Disks 1 &amp; 2](docs/experiments-disk12.md)
 + [Experiments on Disks 4 &amp; 5 (Robust04)](docs/experiments-robust04.md)
@@ -47,7 +48,10 @@ Anserini is designed to support experiments on various standard TREC collections
 + [Experiments on Tweets2013 (MB13 &amp; MB14)](docs/experiments-mb13.md)
 + [Experiments on CAR17](docs/experiments-car17.md)
 
-All of the above results are associated with [regression tests](docs/regressions.md) for replicability.
+Additional experiments:
+
++ [Experiments for JDIQ 2018 article](docs/experiments-jdiq2018.md)
+
 
 ## Additional Documentation
 
@@ -67,7 +71,7 @@ Anserini was designed with Python integration in mind, for connecting with popul
 
 ```
 import jnius_config
-jnius_config.set_classpath("target/anserini-0.1.1-SNAPSHOT-fatjar.jar")
+jnius_config.set_classpath("target/anserini-0.2.1-SNAPSHOT-fatjar.jar")
 
 from jnius import autoclass
 JString = autoclass('java.lang.String')
@@ -100,6 +104,8 @@ hits[0].content
 
 + Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Enabling the Use of Lucene for Information Retrieval Research.](https://dl.acm.org/authorize?N47337) _Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)_, pages 1253-1256, August 2017, Tokyo, Japan.
 
++ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Reproducible Ranking Baselines Using Lucene.](https://dl.acm.org/citation.cfm?doid=3289400.3239571) Journal of Data and Information Quality, 10(4), Article 16, 2018.
+
 ## Acknowledgments
 
 This research has been supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the U.S. National Science Foundation under IIS-1423002 and CNS-1405688. Any opinions, findings, and conclusions or recommendations expressed do not necessarily reflect the views of the sponsors.
diff --git a/docs/experiments-jdiq2018.md b/docs/experiments-jdiq2018.md
@@ -1,14 +1,19 @@
-### JDIQ2018 Effectiveness Scripts
+# Anserini: JDIQ 2018 Experiments
 
-The scripts calculate the optimal performances of all supported ranking models (by grid-searching all possible model parameters).
-The main purpose is to reproduce what we have in our JDIQ2018 paper:
+This page documents the script used in the following article to compute optimal retrieval effectiveness by grid search over model parameters:
 
++ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Reproducible Ranking Baselines Using Lucene.](https://dl.acm.org/citation.cfm?doid=3289400.3239571) Journal of Data and Information Quality, 10(4), Article 16, 2018.
 
-_NOTICE: The query topics used in JDIQ paper are combined topics per collection while the numbers generated
-by the scripts here are separated_
+Note that the values produced by these scripts are _slightly_ different than those reported in the article.
+The reason for these differences stem from the fact that Anserini evolved throughout the peer review process; the values reported in the article where those generated when the manuscript was submitted.
+By the time the article was published, the implementation of Anserini has progressed.
+As Anserini continues to improve we will update these scripts, which will lead to further divergences between the published values.
+Unfortunately, this is an unavoidable aspect of empirical research on software artifacts.
+
+## Parameter Tuning
+
+Invoke the tuning script on various collections as follows, on `tuna`:
 
-### Run
-On tuna:
 ```
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection disk12 >& disk12.jdiq2018.log &
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection robust04 >& robust04.jdiq2018.log &
@@ -19,7 +24,9 @@ nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection cw
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection cw12b13 --metrics map ndcg20 err20 >& cw12b13.jdiq2018.log &
 ```
 
-### Results
+The script assumes hard-coded index directories; modify as appropriate.
+
+## Effectiveness
 
 #### disk12
 MAP                                     | BM25      | F2EXP     | PL2       | QL        | F2LOG     | SPL       |

diff --git a/src/main/resources/jdiq2018/doc.template b/src/main/resources/jdiq2018/doc.template
@@ -1,14 +1,19 @@
-### JDIQ2018 Effectiveness Scripts
+# Anserini: JDIQ 2018 Experiments
 
-The scripts calculate the optimal performances of all supported ranking models (by grid-searching all possible model parameters).
-The main purpose is to reproduce what we have in our JDIQ2018 paper:
+This page documents the script used in the following article to compute optimal retrieval effectiveness by grid search over model parameters:
 
++ Peilin Yang, Hui Fang, and Jimmy Lin. [Anserini: Reproducible Ranking Baselines Using Lucene.](https://dl.acm.org/citation.cfm?doid=3289400.3239571) Journal of Data and Information Quality, 10(4), Article 16, 2018.
 
-_NOTICE: The query topics used in JDIQ paper are combined topics per collection while the numbers generated
-by the scripts here are separated_
+Note that the values produced by these scripts are _slightly_ different than those reported in the article.
+The reason for these differences stem from the fact that Anserini evolved throughout the peer review process; the values reported in the article where those generated when the manuscript was submitted.
+By the time the article was published, the implementation of Anserini has progressed.
+As Anserini continues to improve we will update these scripts, which will lead to further divergences between the published values.
+Unfortunately, this is an unavoidable aspect of empirical research on software artifacts.
+
+## Parameter Tuning
+
+Invoke the tuning script on various collections as follows, on `tuna`:
 
-### Run
-On tuna:
 ```
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection disk12 >& disk12.jdiq2018.log &
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection robust04 >& robust04.jdiq2018.log &
@@ -19,6 +24,8 @@ nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection cw
 nohup python src/main/python/jdiq2018_effectiveness/run_batch.py --collection cw12b13 --metrics map ndcg20 err20 >& cw12b13.jdiq2018.log &
 ```
 
-### Results
+The script assumes hard-coded index directories; modify as appropriate.
+
+## Effectiveness
 
 ${results}