Skip to content

Commit

Permalink
Major refactoring of search code paths (castorini#2310)
Browse files Browse the repository at this point in the history
+ Improved alignment between SearchCollection and dense vector search classes.
+ Aligned ScoredDoc and ScoredDocs (was previously ScoredDocuments) as container objects for Lucene results.
+ Searchers now use ScoredDoc instead of class-specific Result objects.
+ Tweaked SearchCollection args to use proper camelCasing.
+ Consolidated BaseSearcher class for basic ranked list post-processing functionality.
+ Increased test coverage.
  • Loading branch information
lintool authored Dec 24, 2023
1 parent ff16957 commit 825148a
Show file tree
Hide file tree
Showing 59 changed files with 3,142 additions and 2,776 deletions.
6 changes: 3 additions & 3 deletions docs/regressions/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,21 +47,21 @@ target/appassembler/bin/SearchCollection \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking18.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
6 changes: 3 additions & 3 deletions docs/regressions/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,21 +47,21 @@ target/appassembler/bin/SearchCollection \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v2/ \
-topics tools/topics-and-qrels/topics.backgroundlinking19.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v2.bm25+rm3+df.topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
6 changes: 3 additions & 3 deletions docs/regressions/regressions-backgroundlinking20.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,21 +47,21 @@ target/appassembler/bin/SearchCollection \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v3/ \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.wapo.v3/ \
-topics tools/topics-and-qrels/topics.backgroundlinking20.txt \
-topicReader BackgroundLinking \
-output runs/run.wapo.v3.bm25+rm3+df.topics.backgroundlinking20.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 &
-backgroundLinking -backgroundLinking.dateFilter -backgroundLinking.k 100 -bm25 -rm3 -hits 100 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
24 changes: 12 additions & 12 deletions docs/regressions/regressions-mb11.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,78 +56,78 @@ target/appassembler/bin/SearchCollection \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25.topics.microblog2011.txt \
-searchtweets -bm25 &
-searchTweets -bm25 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25.topics.microblog2012.txt \
-searchtweets -bm25 &
-searchTweets -bm25 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25+rm3.topics.microblog2011.txt \
-searchtweets -bm25 -rm3 &
-searchTweets -bm25 -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25+rm3.topics.microblog2012.txt \
-searchtweets -bm25 -rm3 &
-searchTweets -bm25 -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25+ax.topics.microblog2011.txt \
-searchtweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.bm25+ax.topics.microblog2012.txt \
-searchtweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.ql.topics.microblog2011.txt \
-searchtweets -qld &
-searchTweets -qld &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.ql.topics.microblog2012.txt \
-searchtweets -qld &
-searchTweets -qld &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.ql+rm3.topics.microblog2011.txt \
-searchtweets -qld -rm3 &
-searchTweets -qld -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.ql+rm3.topics.microblog2012.txt \
-searchtweets -qld -rm3 &
-searchTweets -qld -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2011.txt \
-topicReader Microblog \
-output runs/run.mb11.ql+ax.topics.microblog2011.txt \
-searchtweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb11/ \
-topics tools/topics-and-qrels/topics.microblog2012.txt \
-topicReader Microblog \
-output runs/run.mb11.ql+ax.topics.microblog2012.txt \
-searchtweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
24 changes: 12 additions & 12 deletions docs/regressions/regressions-mb13.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,78 +56,78 @@ target/appassembler/bin/SearchCollection \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25.topics.microblog2013.txt \
-searchtweets -bm25 &
-searchTweets -bm25 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25.topics.microblog2014.txt \
-searchtweets -bm25 &
-searchTweets -bm25 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25+rm3.topics.microblog2013.txt \
-searchtweets -bm25 -rm3 &
-searchTweets -bm25 -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25+rm3.topics.microblog2014.txt \
-searchtweets -bm25 -rm3 &
-searchTweets -bm25 -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25+ax.topics.microblog2013.txt \
-searchtweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.bm25+ax.topics.microblog2014.txt \
-searchtweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -bm25 -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.ql.topics.microblog2013.txt \
-searchtweets -qld &
-searchTweets -qld &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.ql.topics.microblog2014.txt \
-searchtweets -qld &
-searchTweets -qld &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.ql+rm3.topics.microblog2013.txt \
-searchtweets -qld -rm3 &
-searchTweets -qld -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.ql+rm3.topics.microblog2014.txt \
-searchtweets -qld -rm3 &
-searchTweets -qld -rm3 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2013.txt \
-topicReader Microblog \
-output runs/run.mb13.ql+ax.topics.microblog2013.txt \
-searchtweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.mb13/ \
-topics tools/topics-and-qrels/topics.microblog2014.txt \
-topicReader Microblog \
-output runs/run.mb13.ql+ax.topics.microblog2014.txt \
-searchtweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
-searchTweets -qld -axiom -axiom.beta 1.0 -axiom.deterministic -rerankCutoff 20 &
```

Evaluation can be performed using `trec_eval`:
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions/regressions-msmarco-passage-ca.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ Typical indexing command:
```
target/appassembler/bin/IndexCollection \
-collection JsonCollection \
-input /path/to/msmarco-wp \
-input /path/to/msmarco-passage \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-index.msmarco-passage-ca/ \
-threads 9 -storePositions -storeDocvectors -storeRaw -analyzeWithHuggingFaceTokenizer bert-base-uncased -useCompositeAnalyzer \
>& logs/log.msmarco-wp &
>& logs/log.msmarco-passage &
```

The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format.
Expand All @@ -44,17 +44,17 @@ target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.msmarco-passage-ca/ \
-topics tools/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \
-topicReader TsvInt \
-output runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt \
-bm25 -analyzeWithHuggingFaceTokenizer bert-base-uncased -useCompositeAnalyzer &
-output runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt \
-bm25 -analyzeWithHuggingFaceTokenizer bert-base-uncased -useCompositeAnalyzer &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m map tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M 10 -m recip_rank tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m map tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M 10 -m recip_rank tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
```

## Effectiveness
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions/regressions-msmarco-passage-hgf-wp.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ Typical indexing command:
```
target/appassembler/bin/IndexCollection \
-collection JsonCollection \
-input /path/to/msmarco-wp \
-input /path/to/msmarco-passage \
-generator DefaultLuceneDocumentGenerator \
-index indexes/lucene-index.msmarco-passage-hgf-wp/ \
-threads 9 -storePositions -storeDocvectors -storeRaw -analyzeWithHuggingFaceTokenizer bert-base-uncased \
>& logs/log.msmarco-wp &
>& logs/log.msmarco-passage &
```

The directory `/path/to/msmarco-passage-wp/` should be a directory containing the corpus in Anserini's jsonl format.
Expand All @@ -46,17 +46,17 @@ target/appassembler/bin/SearchCollection \
-index indexes/lucene-index.msmarco-passage-hgf-wp/ \
-topics tools/topics-and-qrels/topics.msmarco-passage.dev-subset.txt \
-topicReader TsvInt \
-output runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt \
-bm25 -analyzeWithHuggingFaceTokenizer bert-base-uncased &
-output runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt \
-bm25 -analyzeWithHuggingFaceTokenizer bert-base-uncased &
```

Evaluation can be performed using `trec_eval`:

```
tools/eval/trec_eval.9.0.4/trec_eval -c -m map tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M 10 -m recip_rank tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-wp.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m map tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -M 10 -m recip_rank tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.100 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
tools/eval/trec_eval.9.0.4/trec_eval -c -m recall.1000 tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage.bm25-default.topics.msmarco-passage.dev-subset.txt
```

## Effectiveness
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/io/anserini/rerank/Reranker.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@

package io.anserini.rerank;

import io.anserini.search.ScoredDocs;

public interface Reranker<T> {
ScoredDocuments rerank(ScoredDocuments docs, RerankerContext<T> context);
ScoredDocs rerank(ScoredDocs docs, RerankerContext<T> context);
String tag();
}
6 changes: 4 additions & 2 deletions src/main/java/io/anserini/rerank/RerankerCascade.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

package io.anserini.rerank;

import io.anserini.search.ScoredDocs;

import java.util.ArrayList;
import java.util.List;

Expand Down Expand Up @@ -57,8 +59,8 @@ public RerankerCascade add(Reranker reranker) {
* @return reranked results
*/
@SuppressWarnings("unchecked")
public ScoredDocuments run(ScoredDocuments docs, RerankerContext context) {
ScoredDocuments results = docs;
public ScoredDocs run(ScoredDocs docs, RerankerContext context) {
ScoredDocs results = docs;

for (Reranker reranker : rerankers) {
results = reranker.rerank(results, context);
Expand Down
Loading

0 comments on commit 825148a

Please sign in to comment.