The state-of-the-art query performance prediction methods rely on the fine-tuning of contextual language models to estimate retrieval effectiveness on a per-query basis. Our work in this paper builds on this strong foundation and proposes to learn rich query representations by learning the interactions between the query and two important contextual information, namely the set of documents retrieved by that query, and the set of similar historical queries with known retrieval effectiveness. We propose that such contextualized query representations can be more accurate estimators of query performance as they embed the performance of past similar queries and the semantics of the documents retrieved by the query. We perform extensive experiments on the MSMARCO collection and its accompanying query sets including MSMARCO Dev set and TREC Deep Learning tracks of 2019, 2020, 2021, and DL-Hard. Our experiments reveal that our proposed method shows robust and effective performance compared to state-of-the-art baselines.
first, you need to clone the repository:
git clone https://github.com/sadjadeb/Nearest-Neighbor-QPP.git
Then, you need to create a virtual environment and install the requirements:
cd Nearest-Neighbor-QPP/
sudo apt-get install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
You need the MSMARCO dataset to run the code.
You can download the dataset from here and here.
After downloading the dataset, you need to extract the files and put them in the data
directory.
Here is the list of the files you need to put in the data
directory:
-
collection.tsv
-
queries.train.tsv
-
qrels.train.tsv
-
top1000.train.tar.gz
-
queries.dev.small.tsv
-
qrels.dev.small.tsv
-
top1000.dev.tar.gz
-
msmarco-test2019-queries.tsv
-
2019qrels-pass.txt
-
msmarco-passagetest2019-top1000.tsv
-
msmarco-test2020-queries.tsv
-
2020qrels-pass.txt
-
msmarco-passagetest2020-top1000.tsv
-
dl_hard-passage.qrels
-
topics.tsv (from DL-Hard)
-
bm25.run (from DL-Hard)
To create a dictionary which maps each query to its actual performance by BM25 (i.e. MRR@10), you need to run the following command:
python extract_metrics_per_query.py --run /path/to/run/file --qrels /path/to/qrels/file --qrels /path/to/qrels/file
It will create a file named run-file-name_evaluation-per-query.json
in the data/eval_per_query
directory.
Then you need to create a file which contains the most similar query from train-set(a.k.a. historical queries with known retrieval effectiveness) to each query. To do so, you need to run the following command:
python find_most_similar_query.py --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --model_name /name/of/the/model --model_name /name/of/the/model --model_name /name/of/the/language/model --hits /number/of/hits --hits /number/of/hits --hits /number/of/hits
Finally, to gather all data in a file to make it easier to load the data, you need to run the following commands:
python create_train_pkl_file.py
python create_test_pkl_file.py
To train the model, you need to run the following command:
python train.py
You can change the hyperparameters of the model by changing the values in the lines 9-12 of the train.py
file.
To test the model, you need to run the following command:
python test.py
To evaluate the model, you need to run the following command:
python evaluation.py --actual /path/to/actual/performance/file --predicted /path/to/predicted/performance/file --target_metric /target/metric
The below table shows the results of our proposed method (NN-QPP) compared to the baselines over four datasets.
QPP Method | MS MARCO Dev small (6980 queries) | DL Hard (50 Queries) | TREC DL 2019 (43 Queries) | TREC DL 2020 (54 Queries) | TREC DL 2021 (53 Queries) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pearson |
kendall |
Spearman |
Pearson |
kendall |
Spearman |
Pearson |
kendall |
Spearman |
Pearson |
kendall |
Spearman |
Pearson |
kendall |
Spearman |
|
Clarity | 0.149 | 0.258 | 0.345 | 0.149 | 0.099 | 0.126 | 0.271 | 0.229 | 0.332 | 0.360 | 0.215 | 0.296 | 0.111 | 0.070 | 0.094 |
WIG | 0.154 | 0.170 | 0.227 | 0.331 | 0.260 | 0.348 | 0.310 | 0.158 | 0.226 | 0.204 | 0.117 | 0.166 | 0.197 | 0.195 | 0.270 |
QF | 0.170 | 0.210 | 0.264 | 0.210 | 0.164 | 0.217 | 0.295 | 0.240 | 0.340 | 0.358 | 0.266 | 0.366 | 0.132 | 0.101 | 0.142 |
NeuralQPP | 0.193 | 0.171 | 0.227 | 0.173 | 0.111 | 0.134 | 0.289 | 0.159 | 0.224 | 0.248 | 0.129 | 0.179 | 0.134 | 0.221 | 0.188 |
n( |
0.221 | 0.217 | 0.284 | 0.195 | 0.120 | 0.147 | 0.371 | 0.256 | 0.377 | 0.480 | 0.329 | 0.478 | 0.269 | 0.169 | 0.256 |
RSD | 0.310 | 0.337 | 0.447 | 0.362 | 0.322 | 0.469 | 0.460 | 0.262 | 0.394 | 0.426 | 0.364 | 0.508 | 0.256 | 0.224 | 0.340 |
SMV | 0.311 | 0.271 | 0.357 | 0.375 | 0.269 | 0.408 | 0.495 | 0.289 | 0.440 | 0.450 | 0.391 | 0.539 | 0.252 | 0.192 | 0.278 |
NQC | 0.315 | 0.272 | 0.358 | 0.384 | 0.288 | 0.417 | 0.466 | 0.267 | 0.399 | 0.464 | 0.294 | 0.423 | 0.271 | 0.201 | 0.292 |
0.316 | 0.303 | 0.398 | 0.359 | 0.319 | 0.463 | 0.507 | 0.293 | 0.432 | 0.511 | 0.347 | 0.476 | 0.272 | 0.223 | 0.327 | |
NQA-QPP | 0.451 | 0.364 | 0.475 | 0.386 | 0.297 | 0.418 | 0.348 | 0.164 | 0.255 | 0.507 | 0.347 | 0.496 | 0.258 | 0.185 | 0.265 |
BERT-QPP | 0.517 | 0.400 | 0.520 | 0.404 | 0.345 | 0.472 | 0.491 | 0.289 | 0.412 | 0.467 | 0.364 | 0.448 | 0.262 | 0.237 | 0.340 |
qpp-BERT-PL | 0.520 | 0.413 | 0.522 | 0.330 | 0.266 | 0.390 | 0.432 | 0.258 | 0.361 | 0.427 | 0.280 | 0.392 | 0.247 | 0.172 | 0.292 |
qpp-PRP | 0.302 | 0.311 | 0.412 | 0.090 | 0.061 | 0.063 | 0.321 | 0.181 | 0.229 | 0.189 | 0.157 | 0.229 | 0.027 | 0.004 | 0.015 |
Ours (NN-QPP) | 0.555 | 0.421 | 0.544 | 0.434 | 0.412 | 0.508 | 0.519 | 0.318 | 0.459 | 0.462 | 0.318 | 0.448 | 0.322 | 0.266 | 0.359 |
We also conducted an ablation study to investigate the impact of size of the Query-Store on the performance of the model. The below figure shows the results of the study.
If you use this code, please cite our paper:
@inproceedings{ebrahimi2024estimating,
title = {Estimating Query Performance Through Rich Contextualized Query Representations},
author = {Ebrahimi, Sajad and Khodabakhsh, Maryam and Arabzadeh, Negar and Bagheri, Ebrahim},
year = {2024},
month = {03},
pages = {49-58},
booktitle={European Conference on Information Retrieval},
organization={Springer},
isbn = {978-3-031-56065-1},
doi = {10.1007/978-3-031-56066-8_6}
}