Update experiments-monot5-tpu.md

castorini · ronakice · May 18, 2021 · May 18, 2021 · May 18, 2021 · May 18, 2021
commit e20e09c0e9db93ab6d730fc242e45f99a52b7aeb
diff --git a/docs/experiments-monot5-tpu.md b/docs/experiments-monot5-tpu.md
@@ -59,8 +59,8 @@ We download the query, qrels, run and corpus files corresponding to the MS MARCO
 The run file is generated by following the Anserini's [BM25 ranking instructions](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md).
 
 In short, the files are:
-- `queries.dev.small.tsv`: 6,980 queries from the MS MARCO dev set.
-- `qrels.dev.small.tsv`: 7,437 pairs of query relevant passage ids from the MS MARCO dev set.
+- `topics.msmarco-passage.dev-subset.txt`: 6,980 queries from the MS MARCO dev set.
+- `qrels.msmarco-passage.dev-subset.txt`: 7,437 pairs of query relevant passage ids from the MS MARCO dev set.
 - `run.dev.small.tsv`: Approximately 6,980,000 pairs of dev set queries and retrieved passages using Anserini's BM25.
 - `collection.tar.gz`: All passages (8,841,823) in the MS MARCO passage corpus. In this tsv file, the first column is the passage id, and the second is the passage text.
 
@@ -70,8 +70,8 @@ Let's start.
 ```
 cd ${DATA_DIR}
 wget https://storage.googleapis.com/duobert_git/run.bm25.dev.small.tsv
-wget https://www.dropbox.com/s/hq6xjhswiz60siu/queries.dev.small.tsv
-wget https://www.dropbox.com/s/ie27l0mzcjb5fbc/qrels.dev.small.tsv
+wget https://raw.githubusercontent.com/castorini/anserini/master/src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.txt
+wget https://raw.githubusercontent.com/castorini/anserini/master/src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt
 wget https://www.dropbox.com/s/m1n2wf80l1lb9j1/collection.tar.gz
 tar -xvf collection.tar.gz
 rm collection.tar.gz
@@ -81,7 +81,7 @@ cd ../../
 
 As a sanity check, we can evaluate the first-stage retrieved documents using the official MS MARCO evaluation script.
 ```
-python tools/scripts/msmarco/msmarco_passage_eval.py ${DATA_DIR}/qrels.dev.small.tsv ${DATA_DIR}/run.dev.small.tsv
+python tools/scripts/msmarco/msmarco_passage_eval.py ${DATA_DIR}/qrels.msmarco-passage.dev-subset.txt ${DATA_DIR}/run.dev.small.tsv
 ```
 
 The output should be:
@@ -94,7 +94,7 @@ QueriesRanked: 6980
 
 Then, we prepare the query-doc pairs in the monoT5 input format.
 ```
-python pygaggle/data/create_msmarco_monot5_input.py --queries ${DATA_DIR}/queries.dev.small.tsv \
+python pygaggle/data/create_msmarco_monot5_input.py --queries ${DATA_DIR}/topics.msmarco-passage.dev-subset.txt \
                                       --run ${DATA_DIR}/run.dev.small.tsv \
                                       --corpus ${DATA_DIR}/collection.tsv \
                                       --t5_input ${DATA_DIR}/query_doc_pairs.dev.small.txt \