-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of SearchCollection #951
Conversation
lintool
commented
Jan 15, 2020
- better logging in SearchCollection
- made parameter naming more consistent
- updated fine-tuning experiments
Codecov Report
@@ Coverage Diff @@
## master #951 +/- ##
=========================================
Coverage ? 39.32%
Complexity ? 506
=========================================
Files ? 120
Lines ? 7249
Branches ? 1087
=========================================
Hits ? 2851
Misses ? 4110
Partials ? 288
Continue to review full report at Codecov.
|
public String[] bm25_k1 = new String[] {"0.9"}; | ||
|
||
@Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter") | ||
public String[] bm25_b = new String[] {"0.4"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some of the Java parameters are snake_case (e.g. bm25_b
) while others are camelCase (e.g. bm25Accurate
).
Shall we normalize them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention we adopted is that underscore is used for grouping parameters together, e.g., bm25_foo
, so foo
are all the BM25 parameters. Otherwise, everything else is camel case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I am fine with it -- they just look weird imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I found bm25Accurate
docs/regressions-car17v1.5.md
Outdated
@@ -40,19 +40,19 @@ nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos | |||
|
|||
nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \ | |||
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \ | |||
-bm25 -axiom -rerankCutoff 20 -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt & | |||
-bm25 -rerankCutoff 20 -axiom -axiom.deterministic -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a more general question, the parameter -rerankCutoff
only applies when we specify isRerank
to true. That means we should first see args.rm3 || args.axiom || args.bm25prf
before seeing -rerankCutoff
. So I guess we should put rm3, axiom, bm25prf
before -rerankCutoff
(the original way)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do rankCutoff
explicitly for each model? E.g., rm3.rerankCutoff
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original motivation was to keep all the "groups" together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I could make the rankCutoff
parameter to the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think rerankCutoff
for each model is necessary as they will probably not present at the same time? (what is the semantic for that?)
Putting rerankCutoff
at the end is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
public String[] bm25_k1 = new String[] {"0.9"}; | ||
|
||
@Option(name = "-bm25.b", handler = StringArrayOptionHandler.class, usage = "BM25: b parameter") | ||
public String[] bm25_b = new String[] {"0.4"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I found bm25Accurate