Refactoring of SearchCollection #951

lintool · 2020-01-15T01:36:00Z

better logging in SearchCollection
made parameter naming more consistent
updated fine-tuning experiments

codecov · 2020-01-15T17:07:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@f399439). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master     #951   +/-   ##
=========================================
  Coverage          ?   39.32%           
  Complexity        ?      506           
=========================================
  Files             ?      120           
  Lines             ?     7249           
  Branches          ?     1087           
=========================================
  Hits              ?     2851           
  Misses            ?     4110           
  Partials          ?      288

Impacted Files	Coverage Δ	Complexity Δ
...rc/main/java/io/anserini/search/SearchMsmarco.java	`0% <0%> (ø)`	`0 <0> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f399439...439755d. Read the comment docs.

Peilin-Yang · 2020-01-15T17:22:44Z

src/main/java/io/anserini/search/SearchArgs.java

+  public String[] bm25_k1 = new String[] {"0.9"};
+
+  @Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter")
+  public String[] bm25_b = new String[] {"0.4"};


I found some of the Java parameters are snake_case (e.g. bm25_b) while others are camelCase (e.g. bm25Accurate).

Shall we normalize them?

I think the convention we adopted is that underscore is used for grouping parameters together, e.g., bm25_foo, so foo are all the BM25 parameters. Otherwise, everything else is camel case?

okay, I am fine with it -- they just look weird imo.

Oh, I found bm25Accurate

Peilin-Yang · 2020-01-15T17:30:51Z

docs/regressions-car17v1.5.md

@@ -40,19 +40,19 @@ nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos

 nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
 -topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
- -bm25 -axiom -rerankCutoff 20 -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt &
+ -bm25 -rerankCutoff 20 -axiom -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt &


As a more general question, the parameter -rerankCutoff only applies when we specify isRerank to true. That means we should first see args.rm3 || args.axiom || args.bm25prf before seeing -rerankCutoff. So I guess we should put rm3, axiom, bm25prf before -rerankCutoff (the original way)?

Should we do rankCutoff explicitly for each model? E.g., rm3.rerankCutoff?

My original motivation was to keep all the "groups" together.

Or I could make the rankCutoff parameter to the end?

I don't think rerankCutoff for each model is necessary as they will probably not present at the same time? (what is the semantic for that?)

Putting rerankCutoff at the end is fine

Peilin-Yang

LGTM

Peilin-Yang · 2020-01-17T21:00:53Z

src/main/java/io/anserini/search/SearchArgs.java

+  public String[] bm25_k1 = new String[] {"0.9"};
+
+  @Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter")
+  public String[] bm25_b = new String[] {"0.4"};


Oh, I found bm25Accurate

…astorini#955) Closes castorini#951

lintool added 9 commits December 9, 2019 22:41

SearchCollection cleanup

ad0c5ab

Merge branch 'master' into search-refactoring

c074018

Changed parameter names.

66562a5

Merge branch 'master' into search-refactoring

8e461eb

Merge branch 'master' into search-refactoring

0863e42

Updated YAML for core17, core18, and disk12

1a13088

Merge branch 'master' into search-refactoring

73a552b

Fixed params.

295ef69

Fixed fine-tuning yaml.

5e8dfc1

lintool requested a review from Peilin-Yang January 15, 2020 01:36

lintool added 3 commits January 14, 2020 20:36

Merge branch 'master' into search-refactoring

f4ad457

Merge branch 'master' into search-refactoring

badb3a5

Updated parameters.

4225a1a

Peilin-Yang reviewed Jan 15, 2020

View reviewed changes

lintool added 3 commits January 17, 2020 15:44

Addressed CR re: order of rankCutoff parameter.

a2b78aa

Tweaked params in docs.

f399439

Merge branch 'master' into search-refactoring

439755d

Peilin-Yang approved these changes Jan 17, 2020

View reviewed changes

lintool merged commit 90c5be8 into master Jan 18, 2020

lintool deleted the search-refactoring branch January 18, 2020 02:02

crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022

Fix test failure: TestLtrMsmarcoDocument and TestLtrMsmarcoPassage (c…

bd9c179

…astorini#955) Closes castorini#951

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of SearchCollection #951

Refactoring of SearchCollection #951

lintool commented Jan 15, 2020

codecov bot commented Jan 15, 2020 •

edited

Loading

Peilin-Yang Jan 15, 2020

lintool Jan 15, 2020

Peilin-Yang Jan 15, 2020

Peilin-Yang Jan 17, 2020

Peilin-Yang Jan 15, 2020

lintool Jan 15, 2020

lintool Jan 15, 2020

lintool Jan 15, 2020

Peilin-Yang Jan 15, 2020

Peilin-Yang left a comment

Peilin-Yang Jan 17, 2020

Refactoring of SearchCollection #951

Refactoring of SearchCollection #951

Conversation

lintool commented Jan 15, 2020

codecov bot commented Jan 15, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Peilin-Yang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 15, 2020 •

edited

Loading