Build using Maven:
mvn clean package appassembler:assemble
The eval/
directory contains evaluation tools and scripts, including
trec_eval,
gdeval.pl,
ndeval.
Before using trec_eval
, unpack and compile it, as follows:
tar xvfz trec_eval.9.0.tar.gz && cd trec_eval.9.0 && make
Before using ndeval
, compile it as follows:
cd ndeval && make
Anserini is designed to support experiments on various standard TREC collections out of the box:
- Experiments on Disks 1 & 2
- Experiments on Disks 4 & 5 (Robust04)
- Experiments on AQUAINT (Robust05)
- Experiments on New York Times (Core17)
- Experiments on Wt10g
- Experiments on Gov2
- Experiments on ClueWeb09 (Category B)
- Experiments on ClueWeb12-B13
- Experiments on ClueWeb12
-
IndexUtils
is a powerful utility to interact with an index using the command line, e.g. print index statistics. Refer totarget/appassembler/bin/IndexUtils -h
for more details. -
MapCollections
is a generic mapper framework that processes each file segment in parallel. Developers can build their own mapper thatextends
toDocumentMapper
. One example is ourCountDocumentMapper
which counts the number of documents in the whole collection:nohup target/appassembler/bin/MapCollections -collection TrecCollection -threads 16 -input /tuna1/collections/newswire/disk12/ -mapper CountDocumentMapper &> log.disk12.count &
Anserini was designed with Python integration in mind, for connecting with popular deep learning toolkits such as PyTorch. This is accomplished via pyjnius. The SimpleSearcher
class provides a simple Python/Java bridge, shown below:
import jnius_config
jnius_config.set_classpath("target/anserini-0.1.1-SNAPSHOT-fatjar.jar")
from jnius import autoclass
JString = autoclass('java.lang.String')
JSearcher = autoclass('io.anserini.search.SimpleSearcher')
searcher = JSearcher(JString('lucene-index.robust04.pos+docvectors+rawdocs'))
hits = searcher.search(JString('hubble space telescope'))
# the docid of the 1st hit
hits[0].docid
# the internal Lucene docid of the 1st hit
hits[0].ldocid
# the score of the 1st hit
hits[0].score
# the full document of the 1st hit
hits[0].content
- v0.1.0: July 4, 2018 [Release Notes]