NN-QPP: Estimating Query Performance Through Rich Contextualized Query Representations

Introduction

The state-of-the-art query performance prediction methods rely on the fine-tuning of contextual language models to estimate retrieval effectiveness on a per-query basis. Our work in this paper builds on this strong foundation and proposes to learn rich query representations by learning the interactions between the query and two important contextual information, namely the set of documents retrieved by that query, and the set of similar historical queries with known retrieval effectiveness. We propose that such contextualized query representations can be more accurate estimators of query performance as they embed the performance of past similar queries and the semantics of the documents retrieved by the query. We perform extensive experiments on the MSMARCO collection and its accompanying query sets including MSMARCO Dev set and TREC Deep Learning tracks of 2019, 2020, 2021, and DL-Hard. Our experiments reveal that our proposed method shows robust and effective performance compared to state-of-the-art baselines.

Running the code

first, you need to clone the repository:

git clone https://github.com/sadjadeb/Nearest-Neighbor-QPP.git

Then, you need to create a virtual environment and install the requirements:

cd Nearest-Neighbor-QPP/
sudo apt-get install virtualenv
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

Download the dataset

You need the MSMARCO dataset to run the code. You can download the dataset from here and here. After downloading the dataset, you need to extract the files and put them in the data directory. Here is the list of the files you need to put in the data directory:

collection.tsv
queries.train.tsv
qrels.train.tsv
top1000.train.tar.gz
queries.dev.small.tsv
qrels.dev.small.tsv
top1000.dev.tar.gz
msmarco-test2019-queries.tsv
2019qrels-pass.txt
msmarco-passagetest2019-top1000.tsv
msmarco-test2020-queries.tsv
2020qrels-pass.txt
msmarco-passagetest2020-top1000.tsv
dl_hard-passage.qrels
topics.tsv (from DL-Hard)
bm25.run (from DL-Hard)

Prepare the data

To create a dictionary which maps each query to its actual performance by BM25 (i.e. MRR@10), you need to run the following command:

python extract_metrics_per_query.py --run /path/to/run/file --qrels /path/to/qrels/file --qrels /path/to/qrels/file

It will create a file named run-file-name_evaluation-per-query.json in the data/eval_per_query directory.

Then you need to create a file which contains the most similar query from train-set(a.k.a. historical queries with known retrieval effectiveness) to each query. To do so, you need to run the following command:

python find_most_similar_query.py --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --base_queries /path/to/train-set/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --target_queries /path/to/desired/queries --model_name /name/of/the/model --model_name /name/of/the/model --model_name /name/of/the/language/model --hits /number/of/hits --hits /number/of/hits --hits /number/of/hits

Finally, to gather all data in a file to make it easier to load the data, you need to run the following commands:

python create_train_pkl_file.py
python create_test_pkl_file.py

Training

To train the model, you need to run the following command:

python train.py

You can change the hyperparameters of the model by changing the values in the lines 9-12 of the train.py file.

Testing

To test the model, you need to run the following command:

python test.py

Evaluation

To evaluate the model, you need to run the following command:

python evaluation.py --actual /path/to/actual/performance/file --predicted /path/to/predicted/performance/file --target_metric /target/metric

Results

The below table shows the results of our proposed method (NN-QPP) compared to the baselines over four datasets.

QPP Method	MS MARCO Dev small (6980 queries)			DL Hard (50 Queries)			TREC DL 2019 (43 Queries)			TREC DL 2020 (54 Queries)			TREC DL 2021 (53 Queries)
QPP Method	Pearson $\rho$	kendall $\tau$	Spearman $\rho$	Pearson $\rho$	kendall $\tau$	Spearman $\rho$	Pearson $\rho$	kendall $\tau$	Spearman $\rho$	Pearson $\rho$	kendall $\tau$	Spearman $\rho$	Pearson $\rho$	kendall $\tau$	Spearman $\rho$
Clarity	0.149	0.258	0.345	0.149	0.099	0.126	0.271	0.229	0.332	0.360	0.215	0.296	0.111	0.070	0.094
WIG	0.154	0.170	0.227	0.331	0.260	0.348	0.310	0.158	0.226	0.204	0.117	0.166	0.197	0.195	0.270
QF	0.170	0.210	0.264	0.210	0.164	0.217	0.295	0.240	0.340	0.358	0.266	0.366	0.132	0.101	0.142
NeuralQPP	0.193	0.171	0.227	0.173	0.111	0.134	0.289	0.159	0.224	0.248	0.129	0.179	0.134	0.221	0.188
n($\sigma_\%$)	0.221	0.217	0.284	0.195	0.120	0.147	0.371	0.256	0.377	0.480	0.329	0.478	0.269	0.169	0.256
RSD	0.310	0.337	0.447	0.362	0.322	0.469	0.460	0.262	0.394	0.426	0.364	0.508	0.256	0.224	0.340
SMV	0.311	0.271	0.357	0.375	0.269	0.408	0.495	0.289	0.440	0.450	0.391	0.539	0.252	0.192	0.278
NQC	0.315	0.272	0.358	0.384	0.288	0.417	0.466	0.267	0.399	0.464	0.294	0.423	0.271	0.201	0.292
$UEF_{NQC}$	0.316	0.303	0.398	0.359	0.319	0.463	0.507	0.293	0.432	0.511	0.347	0.476	0.272	0.223	0.327
NQA-QPP	0.451	0.364	0.475	0.386	0.297	0.418	0.348	0.164	0.255	0.507	0.347	0.496	0.258	0.185	0.265
BERT-QPP	0.517	0.400	0.520	0.404	0.345	0.472	0.491	0.289	0.412	0.467	0.364	0.448	0.262	0.237	0.340
qpp-BERT-PL	0.520	0.413	0.522	0.330	0.266	0.390	0.432	0.258	0.361	0.427	0.280	0.392	0.247	0.172	0.292
qpp-PRP	0.302	0.311	0.412	0.090	0.061	0.063	0.321	0.181	0.229	0.189	0.157	0.229	0.027	0.004	0.015
Ours (NN-QPP)	0.555	0.421	0.544	0.434	0.412	0.508	0.519	0.318	0.459	0.462	0.318	0.448	0.322	0.266	0.359

Ablation Study

We also conducted an ablation study to investigate the impact of size of the Query-Store on the performance of the model. The below figure shows the results of the study.

Citation

If you use this code, please cite our paper:

@inproceedings{ebrahimi2024estimating,
    title = {Estimating Query Performance Through Rich Contextualized Query Representations},
    author = {Ebrahimi, Sajad and Khodabakhsh, Maryam and Arabzadeh, Negar and Bagheri, Ebrahim},
    year = {2024},
    month = {03},
    pages = {49-58},
    booktitle={European Conference on Information Retrieval},
    organization={Springer},
    isbn = {978-3-031-56065-1},
    doi = {10.1007/978-3-031-56066-8_6}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
output		output
.gitignore		.gitignore
README.md		README.md
create_test_pkl_file.py		create_test_pkl_file.py
create_train_pkl_file.py		create_train_pkl_file.py
evalution.py		evalution.py
extract_metrics_per_query.py		extract_metrics_per_query.py
find_most_similar_query.py		find_most_similar_query.py
impact_of_query_store.py		impact_of_query_store.py
qssize.png		qssize.png
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NN-QPP: Estimating Query Performance Through Rich Contextualized Query Representations

Introduction

Running the code

Download the dataset

Prepare the data

Training

Testing

Evaluation

Results

Ablation Study

Citation

About

Contributors 2

Languages

sadjadeb/Nearest-Neighbor-QPP

Folders and files

Latest commit

History

Repository files navigation

NN-QPP: Estimating Query Performance Through Rich Contextualized Query Representations

Introduction

Running the code

Download the dataset

Prepare the data

Training

Testing

Evaluation

Results

Ablation Study

Citation

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages