Skip to content

Commit

Permalink
Robust05 results added (castorini#108)
Browse files Browse the repository at this point in the history
+ add the results of robust05
+ update the results of robust04 using the new topic reader
+ will upload the topics and qrels files once this is merged
  • Loading branch information
Peilin-Yang authored and lintool committed Nov 19, 2016
1 parent 7bd1d9d commit 6705dc7
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 8 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,14 @@ mvn clean package appassembler:assemble

* [Disk12](docs/experiments-disk12.md)
* [Robust04](docs/experiments-robust04.md)
* [Robust05](docs/experiments-robust05.md)
* [WT2G & WT10G](docs/experiments-wt.md)
* [Gov2](docs/experiments-gov2.md)
* [ClueWeb09b](docs/experiments-clueweb09b.md)
* [ClueWeb12-B13](docs/experiments-clueweb12-b13.md)

* [Reference to all Topics and Qrels](src/main/resources/topics-and-qrels/README.md)

* Other features

* [Twitter (Near) Real-Time Search](docs/twitter-nrts.md)
Expand Down
12 changes: 4 additions & 8 deletions docs/experiments-robust04.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,7 @@ eval/trec_eval.9.0/trec_eval src/main/resources/topics-and-qrels/qrels.robust200

**Effectiveness Reference**:

MAP | BM25 | QL
--------------------------------------------------|--------|--------
TREC 2004 Robust Track: Topics 301-450&601-700 | 0.2415 | 0.2370


P30 | BM25 | QL
--------------------------------------------------|--------|--------
TREC 2004 Robust Track: Topics 301-450&601-700 | 0.3115 | 0.3053
Metric | BM25 | QL
-------|--------|--------
MAP | 0.2500 | 0.2465
P30 | 0.3120 | 0.3078
41 changes: 41 additions & 0 deletions docs/experiments-robust05.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Anserini Experiments of Robust05

See http://trec.nist.gov/data/t14_robust.html

**Indexing**:

```
nohup sh target/appassembler/bin/IndexCollection -collection Trec -input /path/to/aquaint/ \
-index lucene-index.aquaint.pos -threads 32 -positions -optimize \
2> log.aquaint.pos.emptyDocids.txt 1> log.aquaint.pos.recordCounts.txt &
```


The directory `/path/to/aquaint/` should be the root directory of AQUAINT collection, i.e., `ls /path/to/aquaint/disk1/` should bring up subdirectory `NYT` and `ls /path/to/aquaint/disk1/` should bring up subdirectory `APW` and `XIE`. The command above builds a standard positional index (`-positions`) that's optimized into a single segment (`-optimize`). If you also want to store document vectors (e.g., for query expansion), add the `-docvectors` option.

_Hint:_ Anserini ignores the `cr` folder when indexing the disk45. But you can remove `cr` folder by your own too.
_Hint:_ You can use the `DumpIndex` utility to print out the statistics of the index. Please refer to [DumpIndex References](dumpindex-reference.md) for the statistics of the index


**Search**:

After indexing is done, you should be able to perform a retrieval run:

```
sh target/appassembler/bin/SearchWebCollection -topicreader Trec -index lucene-index.aquaint.pos -bm25 \
-topics src/main/resources/topics-and-qrels/topics.robust05.txt -output run.aquaint.robust05.bm25.txt
```

**Evaluate**:

Evaluation can be done using `trec_eval`:
```
eval/trec_eval.9.0/trec_eval src/main/resources/topics-and-qrels/qrels.robust2005.txt run.aquaint.robust05.bm25.txt
```

**Effectiveness Reference**:

Metric | BM25 | QL
-------|--------|--------
MAP | 0.2004 | 0.2025
P30 | 0.3667 | 0.3707
6 changes: 6 additions & 0 deletions src/main/resources/topics-and-qrels/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ Robust04
+ topics.robust04.301-450.601-700.txt: [Topics 301-450&601-700 (TREC 2004 Robust Track)](http://trec.nist.gov/data/robust/04.testset.gz)
+ qrels.robust2004.txt: [qrels for Topics 301-450&601-700 (TREC 2004 Robust Track)](http://trec.nist.gov/data/robust/qrels.robust2004.txt)

Robust05
========

+ topics.robust05.txt: [Hard Topics of ROBUST04 (TREC 2005 Robust Track)](http://trec.nist.gov/data/robust/05/05.50.topics.txt)
+ qrels.robust2005.txt: [qrels (TREC 2005 Robust Track)](http://trec.nist.gov/data/robust/05/TREC2005.qrels.txt)


Gov2
====
Expand Down

0 comments on commit 6705dc7

Please sign in to comment.