Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of regressions for MS MARCO {passage, doc} and DL {19, 20} #1559

Merged
merged 36 commits into from
Jun 14, 2021

Conversation

lintool
Copy link
Member

@lintool lintool commented Jun 9, 2021

Covers

  • MS MARCO passage + {doc2query, docTTTTTquery}
  • MS MARCO doc {per-doc, per-passage} x {doc2query, docTTTTTquery}
  • {DL19, DL20} passage + {doc2query, docTTTTTquery}
  • {DL19, DL20} doc {per-doc, per passage} x {doc2query, docTTTTTquery}

@codecov
Copy link

codecov bot commented Jun 9, 2021

Codecov Report

Merging #1559 (cff9f7e) into master (9d82567) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1559   +/-   ##
=========================================
  Coverage     57.31%   57.31%           
  Complexity      998      998           
=========================================
  Files           167      167           
  Lines          9274     9274           
  Branches       1281     1281           
=========================================
  Hits           5315     5315           
  Misses         3522     3522           
  Partials        437      437           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d82567...cff9f7e. Read the comment docs.

Copy link
Member

@ronakice ronakice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Few minor comments.

@@ -62,12 +62,18 @@ nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTT
nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTTquery-per-passage >& logs/log.msmarco-doc-docTTTTTquery-per-passage &

nohup python src/main/python/run_regression.py --collection dl19-passage >& logs/log.dl19-passage &
nohup python src/main/python/run_regression.py --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nohup python src/main/python/run_regression.py --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --collection dl19-passage-docTTTTTquery >& logs/log.dl19-passage-docTTTTTquery &

nohup python src/main/python/run_regression.py --index --collection msmarco-doc-docTTTTTquery-per-doc >& logs/log.msmarco-doc-docTTTTTquery-per-doc &
nohup python src/main/python/run_regression.py --index --collection msmarco-doc-docTTTTTquery-per-passage >& logs/log.msmarco-doc-docTTTTTquery-per-passage &

nohup python src/main/python/run_regression.py --index --collection dl19-passage >& logs/log.dl19-passage &
nohup python src/main/python/run_regression.py --index --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nohup python src/main/python/run_regression.py --index --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --index --collection dl19-passage-docTTTTTquery >& logs/log.dl19-passage-docTTTTTquery &

@@ -6,6 +6,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.

@lintool lintool merged commit b58c855 into master Jun 14, 2021
@lintool lintool deleted the regression branch June 14, 2021 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants