forked from castorini/anserini
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Robust05 results added (castorini#108)
+ add the results of robust05 + update the results of robust04 using the new topic reader + will upload the topics and qrels files once this is merged
- Loading branch information
1 parent
7bd1d9d
commit 6705dc7
Showing
4 changed files
with
54 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Anserini Experiments of Robust05 | ||
|
||
See http://trec.nist.gov/data/t14_robust.html | ||
|
||
**Indexing**: | ||
|
||
``` | ||
nohup sh target/appassembler/bin/IndexCollection -collection Trec -input /path/to/aquaint/ \ | ||
-index lucene-index.aquaint.pos -threads 32 -positions -optimize \ | ||
2> log.aquaint.pos.emptyDocids.txt 1> log.aquaint.pos.recordCounts.txt & | ||
``` | ||
|
||
|
||
The directory `/path/to/aquaint/` should be the root directory of AQUAINT collection, i.e., `ls /path/to/aquaint/disk1/` should bring up subdirectory `NYT` and `ls /path/to/aquaint/disk1/` should bring up subdirectory `APW` and `XIE`. The command above builds a standard positional index (`-positions`) that's optimized into a single segment (`-optimize`). If you also want to store document vectors (e.g., for query expansion), add the `-docvectors` option. | ||
|
||
_Hint:_ Anserini ignores the `cr` folder when indexing the disk45. But you can remove `cr` folder by your own too. | ||
_Hint:_ You can use the `DumpIndex` utility to print out the statistics of the index. Please refer to [DumpIndex References](dumpindex-reference.md) for the statistics of the index | ||
|
||
|
||
**Search**: | ||
|
||
After indexing is done, you should be able to perform a retrieval run: | ||
|
||
``` | ||
sh target/appassembler/bin/SearchWebCollection -topicreader Trec -index lucene-index.aquaint.pos -bm25 \ | ||
-topics src/main/resources/topics-and-qrels/topics.robust05.txt -output run.aquaint.robust05.bm25.txt | ||
``` | ||
|
||
**Evaluate**: | ||
|
||
Evaluation can be done using `trec_eval`: | ||
``` | ||
eval/trec_eval.9.0/trec_eval src/main/resources/topics-and-qrels/qrels.robust2005.txt run.aquaint.robust05.bm25.txt | ||
``` | ||
|
||
**Effectiveness Reference**: | ||
|
||
Metric | BM25 | QL | ||
-------|--------|-------- | ||
MAP | 0.2004 | 0.2025 | ||
P30 | 0.3667 | 0.3707 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters