GitHub - DuTim/retriv_chinese_example_bm25_topk: use case for bm 25 in the chinese curpus use the retriv freamework

simple use case

How to use

## how run to generate topk file  to geenerate dev.top100.bm25.tsv and  train.top100.bm25.tsv (be zip)

run the main_retri_example.ipynb file

base on the [retriv framework] to use the bm25 method to geenerate a topk file about the trian.queries.tsv and dev.queries.tsv

speed of the frame work detail data is very quickly !
the data from the alibaba data in the [repo] [Paper]. /data/video
In order to better understand the function of the file, I simply changed part of the file name, without affecting the understanding of the premise

For example:

train.query.txt ==>   train.queries.tsv

qrels.dev.txt ==> dev.qrels.tsv

the stop words.txt from there
environment

python==3.8.0
retriv=0.2.0
csv
jieba
tqdm

preject structure


 ├─ data
 │  ├─ cn_stopwords.txt
 │  ├─ ecom
 │  │  ├─ bm25.ipynb
 │  │  ├─ corpus.tsv
 │  │  ├─ dev.qrels.tsv
 │  │  ├─ dev.queries.tsv
 │  │  ├─ dev.query.txt
 │  │  ├─ dev.top100.bm25.tsv
 │  │  ├─ logdata.py
 │  │  ├─ retri.ipynb
 │  │  ├─ train.qrels.tsv
 │  │  ├─ train.queries.tsv
 │  │  ├─ train.query.txt
 │  │  └─ train.top100.bm25.tsv(zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

simple use case

How to use

base on the [retriv framework] to use the bm25 method to geenerate a topk file about the trian.queries.tsv and dev.queries.tsv

preject structure

If you think it is helpful to you, please give me a star

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cn_stopwords.txt		cn_stopwords.txt
corpus.tsv		corpus.tsv
dev.qrels.tsv		dev.qrels.tsv
dev.queries.tsv		dev.queries.tsv
dev.query.txt		dev.query.txt
dev.top100.bm25.tsv		dev.top100.bm25.tsv
logdata.py		logdata.py
main_retri_example.ipynb		main_retri_example.ipynb
readme.md		readme.md
train.qrels.tsv		train.qrels.tsv
train.queries.tsv		train.queries.tsv
train.query.txt		train.query.txt
train.top100.bm25.zip		train.top100.bm25.zip

DuTim/retriv_chinese_example_bm25_topk

Folders and files

Latest commit

History

Repository files navigation

simple use case

How to use

base on the [retriv framework] to use the bm25 method to geenerate a topk file about the trian.queries.tsv and dev.queries.tsv

preject structure

If you think it is helpful to you, please give me a star

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages