Python >= 3.6
pip3 install -r requirements.txt
python3 -m spacy download en
-
setup elasticsearch service, refer to link
-
setting value
ES_BASE_URL
in constants.py with your configured elastic search endpoint.
- unzip file and put all files under
data/
folder, renametest.csv
totest_release.csv
- execute
bash scripts/prepare_data.sh
in project root folder to build the data for next step
execute bash scripts/run_retrieval.sh
in project root folder
the above script includes three main parts
-
execute elasticsearch to retrieval candidate papers
-
prepare rerank data from elastic search result (baseline result)
-
execute the rerank by BERT
-
recall phase
noun chunk extraction + textrank keyword extraction + BM25 based search (elasticsearch)
-
rerank phase
Bert based rerank (SciBert from AllenAI)
The model required to be trained in this project just the Bert based reranking model