PyGaggle

PyGaggle provides a gaggle of deep neural architectures for text ranking and question answering. It was designed for tight integration with Pyserini, but can be easily adapted for other sources as well.

Currently, this repo contains implementations of the rerankers for CovidQA on CORD-19, as described in "Rapidly Bootstrapping a Question Answering Dataset for COVID-19".

Installation

Install via PyPI pip install pygaggle. Requires Python 3.6+
Install PyTorch 1.4+.
Download the index: sh scripts/update-index.sh.
Make sure you have an installation of Java 11+: javac --version.
Install Anserini.

Evaluations

Additional Instructions

Clone the repo with git clone git@github.com:castorini/pygaggle.git
Make you sure you have an installation of Python 3.6+. All python commands below refer to this.
For pip, do pip install -r requirements.txt
- If you prefer Anaconda, use conda env create -f environment.yml && conda activate pygaggle.

Running rerankers on CovidQA

For a full list of mostly self-explanatory environment variables, see this file.

BM25 uses the CPU. If you don't have a GPU for the transformer models, pass --device cpu (PyTorch device string format) to the script.

Note: Run the following evaluations at root of this repo.

Unsupervised Methods

BM25:

python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25

BERT:

python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased

SciBERT:

python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased

BioBERT:

python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert

Supervised Methods

T5 (fine-tuned on MS MARCO):

python -um pygaggle.run.evaluate_kaggle_highlighter --method t5

BioBERT (fine-tuned on SQuAD v1.1):

mkdir biobert-squad && cd biobert-squad
Download the weights, vocab, and config from the BioBERT repository to biobert-squad.
Untar the model and rename some files in biobert-squad:

tar -xvzf BERT-pubmed-1000000-SQuAD.tar.gz
mv bert_config.json config.json
for filename in model.ckpt*; do
    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
done

Evaluate the model:

cd .. # go to root of this of repo
python -um pygaggle.run.evaluate_kaggle_highlighter --method qa_transformer --model-name <folder path>

BioBERT (fine-tuned on MS MARCO):

Download the weights, vocab, and config from our Google Storage bucket. This requires an installation of gsutil.

mkdir biobert-marco && cd biobert-marco
gsutil cp "gs://neuralresearcher_data/doc2query/experiments/exp374/model.ckpt-100000*" .
gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/bert_config.json config.json
gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/vocab.txt .

Rename the files:

for filename in model.ckpt*; do
    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
done

Evaluate the model:

cd .. # go to root of this repo
python -um pygaggle.run.evaluate_kaggle_highlighter --method seq_class_transformer --model-name <folder path>

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
docs		docs
evaluate/msmarco		evaluate/msmarco
indexes		indexes
logs		logs
models		models
notebooks		notebooks
pygaggle		pygaggle
runs		runs
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyGaggle

Installation

Evaluations

Additional Instructions

Running rerankers on CovidQA

Unsupervised Methods

Supervised Methods

About

Releases

Packages

Languages

License

leungjch/pygaggle

Folders and files

Latest commit

History

Repository files navigation

PyGaggle

Installation

Evaluations

Additional Instructions

Running rerankers on CovidQA

Unsupervised Methods

Supervised Methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages