## how run to generate topk file to geenerate dev.top100.bm25.tsv and train.top100.bm25.tsv (be zip)
run the main_retri_example.ipynb file
base on the [retriv framework] to use the bm25 method to geenerate a topk file about the trian.queries.tsv and dev.queries.tsv
- speed of the frame work detail data is very quickly !
- the data from the alibaba data in the [repo] [Paper]. /data/video
- In order to better understand the function of the file, I simply changed part of the file name, without affecting the understanding of the premise
For example:
train.query.txt ==> train.queries.tsv
qrels.dev.txt ==> dev.qrels.tsv
-
the stop words.txt from there
-
environment
python==3.8.0
retriv=0.2.0
csv
jieba
tqdm
├─ data
│ ├─ cn_stopwords.txt
│ ├─ ecom
│ │ ├─ bm25.ipynb
│ │ ├─ corpus.tsv
│ │ ├─ dev.qrels.tsv
│ │ ├─ dev.queries.tsv
│ │ ├─ dev.query.txt
│ │ ├─ dev.top100.bm25.tsv
│ │ ├─ logdata.py
│ │ ├─ retri.ipynb
│ │ ├─ train.qrels.tsv
│ │ ├─ train.queries.tsv
│ │ ├─ train.query.txt
│ │ └─ train.top100.bm25.tsv(zip)