Skip to content

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini

License

Notifications You must be signed in to change notification settings

lingwei-gu/pygaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyGaggle

A gaggle of rerankers for CovidQA and CORD-19.

Installation

  1. For pip, do pip install pygaggle. If you prefer Anaconda, use conda env create -f environment.yml && conda activate pygaggle.

  2. Install PyTorch 1.4+.

  3. Download the index: sh scripts/update-index.sh.

  4. Make sure you have an installation of Java 11+: javac --version.

  5. Install Anserini.

Running rerankers on CovidQA

By default, the script uses data/lucene-index-covid-paragraph for the index path. If this is undesirable, set the environment variable CORD19_INDEX_PATH to the path of the index.

Unsupervised Methods

BM25: python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25

BERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased

SciBERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased

BioBERT: python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert

Supervised Methods

T5 (MARCO): python -um pygaggle.run.evaluate_kaggle_highlighter --method t5

Instructions for our other MARCO and SQuAD models coming soon.

About

a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 66.3%
  • Python 33.5%
  • Shell 0.2%