Skip to content

Commit

Permalink
Merge EMNLP branch into master (castorini#25)
Browse files Browse the repository at this point in the history
* Update codebase to train on all MB data

* Fix data format for robust04

* Fix minor errors in new inference code

* Update scripts and path to support core*

* Fix core* bug in data.py

* Add utility scripts

* Minor fixes in MB branch before merge

* Clean up to reproduce EMNLP results

* Add README for arXiv

* Add Anserini commit id

* Fix typo in Zenodo link
  • Loading branch information
zeynepakkalyoncu authored Aug 21, 2019
1 parent 3543e65 commit 7cec228
Show file tree
Hide file tree
Showing 18 changed files with 429 additions and 358 deletions.
141 changes: 63 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
# Birch

[ ![Docker Build Status](https://img.shields.io/docker/cloud/build/osirrc2019/birch.svg)](https://hub.docker.com/r/osirrc2019/birch)
[ ![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3269890.svg)](https://doi.org/10.5281/zenodo.3269890)
[ ![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3372764.svg)](http://doi.org/10.5281/zenodo.3372764)

Document ranking via sentence modeling using BERT
Document ranking via sentence modeling using BERT

Note:
The results in the arXiv paper [Simple Applications of BERT for Ad Hoc Document Retrieval](https://arxiv.org/abs/1903.10972) have been superseded by the results in the EMNLP'19 paper [Cross-Domain Modeling of Sentence-Level Evidence
for Document Retrieval].
To reproduce the results in the arXiv paper, please follow the instructions [here](https://github.com/castorini/birch/blob/master/reproduce_arxiv.md) instead.

## Environment & Data

Expand All @@ -18,112 +22,93 @@ pip install Cython # jnius dependency
pip install -r requirements.txt
git clone https://github.com/NVIDIA/apex
cd apex && pip install -v --no-cache-dir . && cd ..
cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
# Set up Anserini
# Set up Anserini (last reproduced with commit id: f690b5b769d7b0a623e034b31438df126d81b791)
git clone https://github.com/castorini/anserini.git
cd anserini && mvn clean package appassembler:assemble
cd eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..
# Download data and models
wget https://zenodo.org/record/3269890/files/birch_data.tar.gz
tar -xzvf birch_data.tar.gz
cd data
wget https://zenodo.org/record/3372764/files/emnlp_bert4ir.tar.gz
tar -xzvf emnlp_bert4ir.tar.gz
cd ..
```

## Dataset

```
python src/robust04_cv.py --anserini_path <path/to/anserini> --index_path <path/to/index> --cv_fold <2, 5>
```

This step retrieves documents to depth 1000 for each query, and splits them into sentences to generate folds data. You may skip to the next step and and use the downloaded data under `data/datasets`.
Experiment Names:
- large_mb_robust04, large_mb_core17, large_mb_core18
- large_car_mb_robust04, large_car_mb_core17, large_car_mb_core18
- large_msmarco_mb_robust04, large_msmarco_mb_core17, large_msmarco_mb_core18
- large_car_robust04, large_car_core17, large_car_core18
- large_msmarco_robust04, large_msmarco_core17, large_msmarco_core18

## Training

```
python src/main.py --mode training --collection mb --qrels_file qrels.microblog.txt --batch_size <batch_size> --eval_steps <eval_steps> --learning_rate <learning_rate> --num_train_epochs <num_train_epochs> --device cuda
```

## Inference
For BERT(MB):

```
python src/main.py --mode inference --experiment <qa_2cv, mb_2cv, qa_5cv, mb_5cv> --collection <robust04_2cv, robust04_5cv> --model_path <models/saved.mb_3, models/saved.qa_2> --load_trained --batch_size <batch_size> --device cuda
export CUDA_VISIBLE_DEVICES=0; experiment=${experiment}; \
nohup python -u src/main.py --mode training --experiment ${experiment} --collection mb \
--local_model <models/bert-large-uncased.tar.gz> \
--local_tokenizer models/bert-large-uncased-vocab.txt --batch_size 16 \
--data_path data --predict_path data/predictions/predict.${experiment} \
--model_path models/saved.${experiment} --eval_steps 1000 --qrels_file qrels.microblog.txt \
--device cuda --output_path logs/out.${experiment} --qrels_file qrels.microblog.txt > logs/${experiment}.log 2>&1 &
```

Note that this step takes a long time.
If you don't want to evaluate the pretrained models, you may skip to the next step and evaluate with our predictions under `data/predictions`.

## Evaluation

### BM25+RM3:
For BERT(CAR -> MB) and BERT(MS MARCO -> MB):

```
./eval_scripts/baseline.sh <path/to/anserini> <path/to/index> <2, 5>
export CUDA_VISIBLE_DEVICES=0; experiment=${experiment}; \
nohup python -u src/main.py --mode training --experiment ${experiment} --collection mb \
--local_model <models/pytorch_msmarco.tar.gz, models/pytorch_car.tar.gz> \
--local_tokenizer models/bert-large-uncased-vocab.txt --batch_size 16 \
--data_path data --predict_path data/predictions/predict.${experiment} \
--model_path models/saved.${experiment} --eval_steps 1000 --qrels_file qrels.microblog.txt \
--device cuda --output_path logs/out.${experiment} --qrels_file qrels.microblog.txt > logs/${experiment}.log 2>&1 &
```

### Sentence Evidence:

- Compute document score
## Inference

Set the last argument to True if you want to tune the hyperparameters first.
To use the default hyperparameters, set to False.
For BERT(MB), BERT(CAR -> MB) and BERT(MS MARCO -> MB):

```
./eval_scripts/test.sh <qa_2cv, mb_2cv, qa_5cv, mb_5cv> <2, 5> <path/to/anserini> <True, False>
export CUDA_VISIBLE_DEVICES=0; experiment=<experiment_name>; \
nohup python -u src/main.py --mode inference --experiment ${experiment} --collection <robust04, core17, core18> \
--load_trained --model_path <models/saved.large_mb_2, models/saved.car_mb_1, models/saved.msmarco_mb_2> \
--batch_size 4 --data_path data --predict_path data/predictions/predict.${experiment} \
--device cuda --output_path logs/out.${experiment} > logs/${experiment}.log 2>&1 &
```

- Evaluate with trec_eval
For BERT(CAR) and BERT(MS MARCO):

```
./eval_scripts/eval.sh <bm25+rm3_2cv, qa_2cv, mb_2cv, bm25+rm3_5cv, qa_5cv, mb_5cv> <path/to/anserini> qrels.robust2004.txt
export CUDA_VISIBLE_DEVICES=0; experiment=<experiment_name; \
nohup python -u src/main.py --mode inference --experiment ${experiment} --collection <robust04, core17, core18> \
--local_model <models/pytorch_msmarco.tar.gz, models/pytorch_car.tar.gz> \
--local_tokenizer models/bert-large-uncased-vocab.txt --batch_size 4 \
--data_path data --predict_path data/predictions/predict.${experiment} \
--device cuda --output_path logs/out.${experiment} > logs/${experiment}.log 2>&1 &
```

Note that this step takes a long time.
If you don't want to evaluate the pretrained models, you may skip to the next step and evaluate with our predictions under `data/predictions`.

---
## Evaluation

## Result on Robust04

- "Paper 1" based on two-fold CV:

| Model | AP | P@20 |
|:-------------------:|:------:|:------:|
| Paper 1 (two fold) | 0.2971 | 0.3948 |
| BM25+RM3 (Anserini) | 0.2987 | 0.3871 |
| 1S: BERT(QA) | 0.3014 | 0.3928 |
| 2S: BERT(QA) | 0.3003 | 0.3948 |
| 3S: BERT(QA) | 0.3003 | 0.3948 |
| 1S: BERT(MB) | 0.3241 | 0.4217 |
| 2S: BERT(MB) | 0.3240 | 0.4209 |
| 3S: BERT(MB) | **0.3244** | **0.4219** |

- "Paper 2" based on five-fold CV:

| Model | AP | P@20 |
|:-------------------:|:------:|:------:|
| Paper 2 (five fold) | 0.272 | 0.386 |
| BM25+RM3 (Anserini) | 0.3033 | 0.3974 |
| 1S: BERT(QA) | 0.3102 | 0.4068 |
| 2S: BERT(QA) | 0.3090 | 0.4064 |
| 3S: BERT(QA) | 0.3090 | 0.4064 |
| 1S: BERT(MB) | 0.3266 | 0.4245 |
| 2S: BERT(MB) | **0.3278** | 0.4267 |
| 3S: BERT(MB) | **0.3278** | **0.4287** |

See this [paper](https://dl.acm.org/citation.cfm?id=3308781) for the exact fold settings.

### Replication Log
```
experiment=<experiment_name>
collection=<robust04, core17, core18>
anserini_path=<path/to/anserini/root>
data_path=<path/to/data/root>
+ Results replicated by [@emmileaf](https://github.com/emmileaf) on 2019-06-10 (commit [`cc42b60`](https://github.com/castorini/birch/commit/cc42b60093090969c1d9b24cddd1257c1cad66df))

---
# Tune hyperparameters
./eval_scripts/train.sh ${experiment} ${collection} ${anserini_path}
**How do I cite this work?**
# Run experiment
./eval_scripts/test.sh #{experiment} ${collection} ${anserini_path}
```
@article{yang2019simple,
title={Simple Applications of BERT for Ad Hoc Document Retrieval},
author={Yang, Wei and Zhang, Haotian and Lin, Jimmy},
journal={arXiv preprint arXiv:1903.10972},
year={2019}
}
# Evaluate with trec_eval
./eval_scripts/eval.sh #{experiment} ${anserini_path} ${data_path}
```
13 changes: 7 additions & 6 deletions eval_scripts/eval.sh
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
experiment=$1
anserini_path=$2
qrels_file=$3
collection=$2
anserini_path=$3
data_path=$4

echo "Experiment: ${experiment}"

if [[ ${experiment} == *"bm25+rm3"* ]] ; then
echo "BM25+RM3:"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 "${anserini_path}/src/main/resources/topics-and-qrels/${qrels_file}" "runs/run.${experiment}.txt"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 -m ndcg_cut.20 "${data_path}/qrels/qrels.${collection}.txt" "runs/run.${experiment}.txt"
else
echo "1S:"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 "${anserini_path}/src/main/resources/topics-and-qrels/${qrels_file}" "runs/run.${experiment}.cv.a"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 -m ndcg_cut.20 "${data_path}/qrels/qrels.${collection}.txt" "runs/run.${experiment}.cv.a"

echo "2S:"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 "${anserini_path}/src/main/resources/topics-and-qrels/${qrels_file}" "runs/run.${experiment}.cv.ab"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 -m ndcg_cut.20 "${data_path}/qrels/qrels.${collection}.txt" "runs/run.${experiment}.cv.ab"

echo "3S:"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 "${anserini_path}/src/main/resources/topics-and-qrels/${qrels_file}" "runs/run.${experiment}.cv.abc"
${anserini_path}/eval/trec_eval.9.0.4/trec_eval -M1000 -m map -m P.20 -m ndcg_cut.20 "${data_path}/qrels/qrels.${collection}.txt" "runs/run.${experiment}.cv.abc"
fi
46 changes: 22 additions & 24 deletions eval_scripts/test.sh
Original file line number Diff line number Diff line change
@@ -1,38 +1,36 @@
#!/usr/bin/env bash

experiment=$1
num_folds=$2
collection=$2
anserini_path=$3
tune_params=$4

if [ ${num_folds} == '5' ] ; then
folds_file="robust04-paper2-folds.json"
collection="robust04_5cv"
else
folds_file="robust04-paper1-folds.json"
collection="robust04_2cv"
fi
declare -a sents=("a" "ab" "abc")

if [ ${tune_params} ] ; then
declare -a sents=("a" "ab" "abc")

./eval_scripts/train.qqsh ${experiment} ${num_folds} ${anserini_path}

for i in "${sents[@]}"
do
for j in $(seq 0 $((num_folds - 1)))
for i in "${sents[@]}"
do
if [[ "${collection}" == "robust04" ]] ; then
for j in $(seq 0 4)
do
while IFS= read -r line
do
alpha=$(echo ${line#?} | cut -d" " -f1)
beta=$(echo ${line#?} | cut -d" " -f2)
gamma=$(echo ${line#?} | cut -d" " -f3)
done < "log/${experiment}/${j}${i}_best.txt"
done < "run_logs/${experiment}/${j}${i}_best.txt"

python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} --folds_file ${folds_file} 3 ${alpha} ${beta} ${gamma} ${j} test
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 ${alpha} ${beta} ${gamma} $j test
done
cat runs/run.${experiment}.cv.test.* > runs/run.${experiment}.cv.${i}
done
else
./eval_scripts/${experiment}_eval.sh ${experiment} ${collection} ${anserini_path} ${folds_file}
fi
cat runs/run.${experiment}.cv.test.* > runs/run.${experiment}.cv.$i
else
while IFS= read -r line
do
alpha=$(echo ${line#?} | cut -d" " -f1)
beta=$(echo ${line#?} | cut -d" " -f2)
gamma=$(echo ${line#?} | cut -d" " -f3)
done < "run_logs/${experiment}/${i}_best.txt"

python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 ${alpha} ${beta} ${gamma} 0 all
mv runs/run.${experiment}.cv.all runs/run.${experiment}.cv.$i
fi
done

44 changes: 25 additions & 19 deletions eval_scripts/train.sh
Original file line number Diff line number Diff line change
@@ -1,32 +1,38 @@
#!/usr/bin/env bash

experiment=$1
num_folds=$2
collection=$2
anserini_path=$3

if [ ${num_folds} == '5' ] ; then
folds_file="robust04-paper2-folds.json"
collection="robust04_5cv"
else
folds_file="robust04-paper1-folds.json"
collection="robust04_2cv"
if [ ! -d "run_logs/${experiment}" ] ; then
mkdir -p "run_logs/${experiment}"
fi

if [ ! -d "log/${experiment}" ] ; then
mkdir -p "log/${experiment}"
fi
if [[ "${collection}" == "robust04" ]] ; then
for i in $(seq 0 4)
do
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 0.1 0.1 $i train > "run_logs/${experiment}/eval${i}a.txt"
cat "run_logs/${experiment}/eval${i}a.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/${i}a_best.txt"
rm "runs/run.${experiment}.cv.train"

for i in $(seq 0 $((num_folds - 1)))
do
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --folds_file ${folds_file} --anserini_path ${anserini_path} --data_path data 3 1.0 0.1 0.1 $i train > "log/${experiment}/eval${i}a.txt"
cat "log/${experiment}/eval${i}a.txt" | sort -k5r,5 -k3,3 | head -1 > "log/${experiment}/${i}a_best.txt"
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 1.0 0.1 $i train > "run_logs/${experiment}/eval${i}ab.txt"
cat "run_logs/${experiment}/eval${i}ab.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/${i}ab_best.txt"
rm "runs/run.${experiment}.cv.train"

python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 1.0 1.0 $i train > "run_logs/${experiment}/eval${i}abc.txt"
cat "run_logs/${experiment}/eval${i}abc.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/${i}abc_best.txt"
rm "runs/run.${experiment}.cv.train"
done
else
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 0.1 0.1 0 train > "run_logs/${experiment}/evala.txt"
cat "run_logs/${experiment}/evala.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/a_best.txt"
rm "runs/run.${experiment}.cv.train"

python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --folds_file ${folds_file} --anserini_path ${anserini_path} --data_path data 3 1.0 1.0 0.1 ${i} train > "log/${experiment}/eval${i}ab.txt"
cat "log/${experiment}/eval${i}ab.txt" | sort -k5r,5 -k3,3 | head -1 > "log/${experiment}/${i}ab_best.txt"
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 1.0 0.1 0 train > "run_logs/${experiment}/evalab.txt"
cat "run_logs/${experiment}/evalab.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/ab_best.txt"
rm "runs/run.${experiment}.cv.train"

python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --folds_file ${folds_file} --anserini_path ${anserini_path} --data_path data 3 1.0 1.0 1.0 $i train > "log/${experiment}/eval${i}abc.txt"
cat "log/${experiment}/eval${i}abc.txt" | sort -k5r,5 -k3,3 | head -1 > "log/${experiment}/${i}abc_best.txt"
python src/main.py --mode retrieval --experiment ${experiment} --collection ${collection} --anserini_path ${anserini_path} 3 1.0 1.0 1.0 0 train > "run_logs/${experiment}/evalabc.txt"
cat "run_logs/${experiment}/evalabc.txt" | sort -k5r,5 -k3,3 | head -1 > "run_logs/${experiment}/abc_best.txt"
rm "runs/run.${experiment}.cv.train"
done
fi
Loading

0 comments on commit 7cec228

Please sign in to comment.