A gaggle of rerankers for CovidQA and CORD-19.
-
For pip, do
pip install pygaggle
. If you prefer Anaconda, useconda env create -f environment.yml && conda activate pygaggle
. -
Install PyTorch 1.4+.
-
Download the index:
sh scripts/update-index.sh
. -
Make sure you have an installation of Java 11+:
javac --version
. -
Install Anserini.
By default, the script uses data/lucene-index-covid-paragraph
for the index path.
If this is undesirable, set the environment variable CORD19_INDEX_PATH
to the path of the index.
BM25: python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25
BERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased
SciBERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased
BioBERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert
T5 (MARCO): python -um pygaggle.run.evaluate_kaggle_highlighter --method t5
Instructions for our other MARCO and SQuAD models coming soon.