Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of SearchCollection #951

Merged
merged 15 commits into from
Jan 18, 2020
Merged

Refactoring of SearchCollection #951

merged 15 commits into from
Jan 18, 2020

Conversation

lintool
Copy link
Member

@lintool lintool commented Jan 15, 2020

  • better logging in SearchCollection
  • made parameter naming more consistent
  • updated fine-tuning experiments

@lintool lintool requested a review from Peilin-Yang January 15, 2020 01:36
@codecov
Copy link

codecov bot commented Jan 15, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@f399439). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #951   +/-   ##
=========================================
  Coverage          ?   39.32%           
  Complexity        ?      506           
=========================================
  Files             ?      120           
  Lines             ?     7249           
  Branches          ?     1087           
=========================================
  Hits              ?     2851           
  Misses            ?     4110           
  Partials          ?      288
Impacted Files Coverage Δ Complexity Δ
...rc/main/java/io/anserini/search/SearchMsmarco.java 0% <0%> (ø) 0 <0> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f399439...439755d. Read the comment docs.

public String[] bm25_k1 = new String[] {"0.9"};

@Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter")
public String[] bm25_b = new String[] {"0.4"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some of the Java parameters are snake_case (e.g. bm25_b) while others are camelCase (e.g. bm25Accurate).

Shall we normalize them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention we adopted is that underscore is used for grouping parameters together, e.g., bm25_foo, so foo are all the BM25 parameters. Otherwise, everything else is camel case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I am fine with it -- they just look weird imo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I found bm25Accurate

@@ -40,19 +40,19 @@ nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-bm25 -axiom -rerankCutoff 20 -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt &
-bm25 -rerankCutoff 20 -axiom -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a more general question, the parameter -rerankCutoff only applies when we specify isRerank to true. That means we should first see args.rm3 || args.axiom || args.bm25prf before seeing -rerankCutoff. So I guess we should put rm3, axiom, bm25prf before -rerankCutoff (the original way)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do rankCutoff explicitly for each model? E.g., rm3.rerankCutoff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original motivation was to keep all the "groups" together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could make the rankCutoff parameter to the end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think rerankCutoff for each model is necessary as they will probably not present at the same time? (what is the semantic for that?)

Putting rerankCutoff at the end is fine

Copy link
Collaborator

@Peilin-Yang Peilin-Yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

public String[] bm25_k1 = new String[] {"0.9"};

@Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter")
public String[] bm25_b = new String[] {"0.4"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I found bm25Accurate

@lintool lintool merged commit 90c5be8 into master Jan 18, 2020
@lintool lintool deleted the search-refactoring branch January 18, 2020 02:02
crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants