Skip to content

Latest commit

 

History

History

baselines

Baseline run description

The baseline system is a three-step retrieval process where, given a query, (1) the top-n documents are retrieved from the index and (2) chunked into passages using Spacy's SentenceRecognizer pipeline. (3) Passages are then re-ranked using a neural re-ranker.

Baseline_architecture

For (1), we use a simple BM25 function (K1=4.46, b=0.82), and for (3), we use a T5 re-ranker trained on the MS MARCO passage dataset.

org_automatic_results_1000.v1.0.run is the results file generated by retrieving and re-ranking the passages from the top 1000 documents using the automatic rewrites.

org_manual_results_1000.v1.0.run is the results file generated by retrieving and re-ranking the passages from the top 1000 documents using the manual rewrites.

We also provide the converted versions of each run after they have been converted from passage to document ids and deduped (document_runs).

Baseline Rewriter Policy

We use the T5 model trained on the CANARD dataset to generate our baseline automatic rewrites.

For a given query n in a topic, the rewrite context consists of all queries from previous turns (turn 1 to turn n-1) and the passages from the last three turns (turn n-3 to turn n-1).

For example, the automatic rewrite for topic 107, turn 8 (identified as 107-8) was generated by passing the following as context:

query 107-1 ||| query 107-2 ||| query 107-3 ||| query 107-4 ||| query 107-5 ||| passage 107-5 ||| query 107-6 ||| passage 107-6 ||| query 107-7 ||| passage 107-7