Skip to content

Commit

Permalink
update scripts and doc
Browse files Browse the repository at this point in the history
  • Loading branch information
supercoderhawk committed Jan 16, 2020
1 parent 9b3be83 commit ad46061
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 22 deletions.
2 changes: 1 addition & 1 deletion ReadMe.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ python3 -m spacy download en
2. setting value `ES_BASE_URL` in constants.py with your configured elastic search endpoint.

### Prepare Data
1. unzip file and put all files under `data/` folder
1. unzip file and put all files under `data/` folder, rename `test.csv` to `test_release.csv`
2. execute `bash scripts/prepare_data.sh` in **project root folder** to build the data for next step

### Execute the retrieval process
Expand Down
22 changes: 14 additions & 8 deletions scripts/run_end2end.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
#!/bin/bash

DATA_DIR=${PWD}/data/
ES_RESULT_FILE=$DATA_DIR/validation_es_result.jsonl
FINAL_RESULT_FILENAME=$DATA_DIR/validation_final_result.jsonl
MODEL_PATH=$MODEL_DIR/rerank_model.model
TOPK=20

# process raw data into jsonl
echo 'starting build raw data...'
# run elasticsearch (BM25)
python3 wsdm_digg/benchmark/benchmarker.py -src_filename $DATA_DIR/validation.jsonl \
-dest_filename $ES_RESULT_FILE

# build elasticsearch index
echo 'starting building elasticsearch indexing...'

# execute recall stage

# execute reranking stage
# run rerank by bert
python3 wsdm_digg/reranking/predict.py -eval_search_filename $ES_RESULT_FILE \
-golden_filename $VALID_FILE \
-dest_filename $RESULT_DIR/$FINAL_RESULT_FILENAME \
-model_path $MODEL_PATH \
-eval_batch_size 10 -topk $TOPK
12 changes: 0 additions & 12 deletions scripts/run_retrieval.sh

This file was deleted.

2 changes: 1 addition & 1 deletion wsdm_digg/benchmark/benchmarker.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def main():
parser.add_argument('-dest_filename', type=str, required=True, )
parser.add_argument('-batch_size', type=int, default=100)
parser.add_argument('-parallel_count', type=int, default=20)
parser.add_argument('-top_n', type=int, default=20)
parser.add_argument('-top_n', type=int, default=100)
parser.add_argument('-is_submit', action='store_true')
args = parser.parse_args()

Expand Down

0 comments on commit ad46061

Please sign in to comment.