Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of regressions for MS MARCO {passage, doc} and DL {19, 20} #1559

Merged
merged 36 commits into from
Jun 14, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
updates
  • Loading branch information
lintool committed Jun 14, 2021
commit cff9f7e1610ecf974925ce59aaa96910178ba0e4
2 changes: 1 addition & 1 deletion docs/regressions-dl19-doc-docTTTTTquery-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-docTTTTTquery-per-passage.yaml).
Expand Down
2 changes: 1 addition & 1 deletion docs/regressions-dl19-doc-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-per-passage.yaml).
Expand Down
2 changes: 1 addition & 1 deletion docs/regressions-dl20-doc-docTTTTTquery-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-docTTTTTquery-per-passage.yaml).
Expand Down
2 changes: 1 addition & 1 deletion docs/regressions-dl20-doc-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-per-passage.yaml).
Expand Down
2 changes: 1 addition & 1 deletion docs/regressions-msmarco-doc-docTTTTTquery-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-docTTTTTquery-per-passage.yaml).
Expand Down
2 changes: 1 addition & 1 deletion docs/regressions-msmarco-doc-per-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-per-passage.yaml).
Expand Down
4 changes: 2 additions & 2 deletions docs/regressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTT
nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTTquery-per-passage >& logs/log.msmarco-doc-docTTTTTquery-per-passage &

nohup python src/main/python/run_regression.py --collection dl19-passage >& logs/log.dl19-passage &
nohup python src/main/python/run_regression.py --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --collection dl19-passage-docTTTTTquery >& logs/log.dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --collection dl19-doc >& logs/log.dl19-doc &
nohup python src/main/python/run_regression.py --collection dl19-doc-per-passage >& logs/log.dl19-doc-per-passage &
nohup python src/main/python/run_regression.py --collection dl19-doc-docTTTTTquery-per-doc >& logs/log.dl19-doc-docTTTTTquery-per-doc &
Expand Down Expand Up @@ -121,7 +121,7 @@ nohup python src/main/python/run_regression.py --index --collection msmarco-doc-
nohup python src/main/python/run_regression.py --index --collection msmarco-doc-docTTTTTquery-per-passage >& logs/log.msmarco-doc-docTTTTTquery-per-passage &

nohup python src/main/python/run_regression.py --index --collection dl19-passage >& logs/log.dl19-passage &
nohup python src/main/python/run_regression.py --index --collection dl19-passage-docTTTTTquery >& logs/dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --index --collection dl19-passage-docTTTTTquery >& logs/log.dl19-passage-docTTTTTquery &
nohup python src/main/python/run_regression.py --index --collection dl19-doc >& logs/log.dl19-doc &
nohup python src/main/python/run_regression.py --index --collection dl19-doc-per-passage >& logs/log.dl19-doc-per-passage &
nohup python src/main/python/run_regression.py --index --collection dl19-doc-docTTTTTquery-per-doc >& logs/log.dl19-doc-docTTTTTquery-per-doc &
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-docTTTTTquery-per-passage.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-per-passage.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl19-doc-docTTTTTquery-per-passage.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/dl20-doc-per-passage.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** doc2query-T5

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-docTTTTTquery-per-passage.yaml).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Note that there are four different regression conditions for this task, and this
+ **Indexing Condition:** each MS MARCO document is first segmented into passages, each passage is treated as a unit of indexing
+ **Expansion Condition:** none

In the passage indexing condition, we select the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
In the passage indexing condition, we select the score of the highest-scoring passage from a document as the score for that document to produce a document ranking; this is known as the MaxP technique.
All four conditions are described in detail [here](https://github.com/castorini/docTTTTTquery#reproducing-ms-marco-document-ranking-results-with-anserini), in the context of doc2query-T5.

The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-doc-per-passage.yaml).
Expand Down