Document ranking via sentence modeling using BERT
# Set up environment
pip install virtualenv
virtualenv -p python3.5 birch_env
source birch_env/bin/activate
# Install dependencies
pip install Cython # jnius dependency
pip install -r requirements.txt
git clone https://github.com/NVIDIA/apex
cd apex && pip install -v --no-cache-dir . && cd ..
# Set up Anserini
git clone https://github.com/castorini/anserini.git
cd anserini && mvn clean package appassembler:assemble
cd eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..
# Download data and models
wget https://zenodo.org/record/3269890/files/birch_data.tar.gz
tar -xzvf birch_data.tar.gz
python src/robust04_cv.py --anserini_path <path/to/anserini> --index_path <path/to/index> --cv_fold <2, 5>
This step retrieves documents to depth 1000 for each query, and splits them into sentences to generate folds data. You may skip to the next step and and use the downloaded data under data/datasets
.
python src/main.py --mode training --collection mb --qrels_file qrels.microblog.txt --batch_size <batch_size> --eval_steps <eval_steps> --learning_rate <learning_rate> --num_train_epochs <num_train_epochs> --device cuda
python src/main.py --mode inference --experiment <qa_2cv, mb_2cv, qa_5cv, mb_5cv> --collection <robust04_2cv, robust04_5cv> --model_path <models/saved.mb_3, models/saved.qa_2> --load_trained --batch_size <batch_size> --device cuda
Note that this step takes a long time.
If you don't want to evaluate the pretrained models, you may skip to the next step and evaluate with our predictions under data/predictions
.
./eval_scripts/baseline.sh <path/to/anserini> <path/to/index> <2, 5>
- Compute document score
Set the last argument to True if you want to tune the hyperparameters first. To use the default hyperparameters, set to False.
./eval_scripts/test.sh <qa_2cv, mb_2cv, qa_5cv, mb_5cv> <2, 5> <path/to/anserini> <True, False>
- Evaluate with trec_eval
./eval_scripts/eval.sh <bm25+rm3_2cv, qa_2cv, mb_2cv, bm25+rm3_5cv, qa_5cv, mb_5cv> <path/to/anserini> qrels.robust2004.txt
- "Paper 1" based on two-fold CV:
Model | AP | P@20 |
---|---|---|
Paper 1 (two fold) | 0.2971 | 0.3948 |
BM25+RM3 (Anserini) | 0.2987 | 0.3871 |
1S: BERT(QA) | 0.3014 | 0.3928 |
2S: BERT(QA) | 0.3003 | 0.3948 |
3S: BERT(QA) | 0.3003 | 0.3948 |
1S: BERT(MB) | 0.3241 | 0.4217 |
2S: BERT(MB) | 0.3240 | 0.4209 |
3S: BERT(MB) | 0.3244 | 0.4219 |
- "Paper 2" based on five-fold CV:
Model | AP | P@20 |
---|---|---|
Paper 2 (five fold) | 0.272 | 0.386 |
BM25+RM3 (Anserini) | 0.3033 | 0.3974 |
1S: BERT(QA) | 0.3102 | 0.4068 |
2S: BERT(QA) | 0.3090 | 0.4064 |
3S: BERT(QA) | 0.3090 | 0.4064 |
1S: BERT(MB) | 0.3266 | 0.4245 |
2S: BERT(MB) | 0.3278 | 0.4267 |
3S: BERT(MB) | 0.3278 | 0.4287 |
See this paper for the exact fold settings.
How do I cite this work?
@article{yang2019simple,
title={Simple Applications of BERT for Ad Hoc Document Retrieval},
author={Yang, Wei and Zhang, Haotian and Lin, Jimmy},
journal={arXiv preprint arXiv:1903.10972},
year={2019}
}