Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init paddle-nlp #2112

Merged
merged 211 commits into from
Apr 22, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
211 commits
Select commit Hold shift + click to select a range
a55440a
init paddle-nlp tools for QA test
chenbjin Apr 5, 2019
c4fc5ea
Fix paragraph extraction bug
Apr 8, 2019
76636fc
Update download links
Apr 8, 2019
13ffb7b
first update LAC README.md
Halfish Apr 8, 2019
d8a536c
rename EmoTect as emotion_detection
chenbjin Apr 8, 2019
797b0d2
download data from bos
Halfish Apr 8, 2019
8cfae7f
Update README.md
Halfish Apr 8, 2019
34c3cc7
Rename project
Apr 8, 2019
b5afc34
second add code
zhangyimi Apr 8, 2019
4df27c1
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 8, 2019
742d7b9
modify downloads.sh for lac
Halfish Apr 8, 2019
c51ed7e
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 8, 2019
c6dcf81
rename LAC to lexical_analysis
Halfish Apr 8, 2019
4a3e311
update lac readme
Halfish Apr 8, 2019
b0ec7ab
Update README.md
zhangyimi Apr 8, 2019
5a55634
Update README.md
zhangyimi Apr 8, 2019
db027f9
Update README.md
zhangyimi Apr 8, 2019
068b6c8
add struct.jpg
zhangyimi Apr 8, 2019
8276d8d
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 8, 2019
97b3b92
Update README.md
zhangyimi Apr 8, 2019
7b687e0
Update README.md
zhangyimi Apr 8, 2019
c6bc41e
update README
chenbjin Apr 8, 2019
bbef065
Update README.md
zhangyimi Apr 8, 2019
c779830
update emotion_detection README
chenbjin Apr 8, 2019
6c131a3
add download_data.sh and download_model.sh
chenbjin Apr 8, 2019
06ef80e
first commit ADE
luluxing3 Apr 9, 2019
0c16b2f
dialogue_model_toolkit_update
0YuanZhang0 Apr 9, 2019
c3db25b
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 9, 2019
4958bc4
update emotion_detection model bos url
chenbjin Apr 9, 2019
cb059cd
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
chenbjin Apr 9, 2019
979c307
update README
luluxing3 Apr 9, 2019
47e175e
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 9, 2019
9c21762
update readme
0YuanZhang0 Apr 9, 2019
7100e3d
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 9, 2019
87a2068
update readme
0YuanZhang0 Apr 9, 2019
7f336d5
update download file
0YuanZhang0 Apr 9, 2019
35e6fff
first commit DAM
luluxing3 Apr 10, 2019
2038575
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 10, 2019
b085ea7
add readme
luluxing3 Apr 10, 2019
cf1d166
fix readme
luluxing3 Apr 10, 2019
dea9ea5
fix readme
luluxing3 Apr 10, 2019
6895fe0
fix readme
luluxing3 Apr 10, 2019
342dc37
fix readme
luluxing3 Apr 10, 2019
8e48e42
fix readme
luluxing3 Apr 10, 2019
d50c211
rename
luluxing3 Apr 10, 2019
2b36c9e
rename again
luluxing3 Apr 10, 2019
6751e15
1. add gradient_clip for ernie_lac
Apr 10, 2019
5b1a309
fix download.sh
zhangyimi Apr 10, 2019
db451cb
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 10, 2019
44c1c13
Rename MRC task
Apr 10, 2019
4458d12
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Apr 10, 2019
99d8362
fix logger
zhangyimi Apr 10, 2019
e37ff4c
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 10, 2019
8ba34f2
fix to douban
luluxing3 Apr 11, 2019
0519e9a
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 11, 2019
d9de355
fix final
luluxing3 Apr 11, 2019
51b1b56
update readme
luluxing3 Apr 11, 2019
cc4544c
update readme
luluxing3 Apr 11, 2019
b41076e
update readme
luluxing3 Apr 11, 2019
e38a9c7
fix batch is null
luluxing3 Apr 11, 2019
9ee6b63
fix typo
luluxing3 Apr 11, 2019
11fd719
fix typo
luluxing3 Apr 11, 2019
5f43165
fix typo
luluxing3 Apr 11, 2019
bb0d090
update ernie config
chenbjin Apr 11, 2019
f7a6ba7
update readme
chenbjin Apr 11, 2019
0436dd3
add AI platform url in readme
chenbjin Apr 11, 2019
a5809d0
update readme subtitlestyle
chenbjin Apr 11, 2019
beae8c6
update
ChinaLiuHao Apr 11, 2019
670b513
Update README.md
ChinaLiuHao Apr 11, 2019
6254425
Update README.md
ChinaLiuHao Apr 11, 2019
72a5e5c
update
ChinaLiuHao Apr 11, 2019
5d902f2
Create README.md
ChinaLiuHao Apr 11, 2019
10fafd3
Update README.md
ChinaLiuHao Apr 11, 2019
8adfb4e
Update README.md
ChinaLiuHao Apr 11, 2019
23cce9a
Update README.md
ChinaLiuHao Apr 11, 2019
15acc84
Update README.md
ChinaLiuHao Apr 11, 2019
0552725
Update README.md
ChinaLiuHao Apr 11, 2019
20bcc96
Update README.md
ChinaLiuHao Apr 11, 2019
15ee96f
Update README.md
ChinaLiuHao Apr 11, 2019
5661bed
Update README.md
ChinaLiuHao Apr 11, 2019
71ec1c8
Update README.md
ChinaLiuHao Apr 11, 2019
40c5902
Update README.md
ChinaLiuHao Apr 11, 2019
1c9827e
Update README.md
ChinaLiuHao Apr 11, 2019
7366794
Update README.md
ChinaLiuHao Apr 11, 2019
0aecd5d
Update README.md
ChinaLiuHao Apr 11, 2019
4511504
Update README.md
ChinaLiuHao Apr 11, 2019
5966ba6
Update README.md
ChinaLiuHao Apr 11, 2019
9480e9c
update batch size
luluxing3 Apr 11, 2019
dcb334e
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 11, 2019
e6c39f2
adapt to samll data size
luluxing3 Apr 11, 2019
e22a6f2
update ERNIE bcebos url
chenbjin Apr 12, 2019
32bbb7b
add language model
Aurelius84 Apr 12, 2019
16e7e6a
Merge pull request #2035 from Aurelius84/paddle-nlp
phlrain Apr 12, 2019
d87cdec
modify readme
0YuanZhang0 Apr 12, 2019
38f8581
update
ChinaLiuHao Apr 12, 2019
5ee4b0f
update
ChinaLiuHao Apr 12, 2019
b695f7c
Update README.md
ChinaLiuHao Apr 12, 2019
2172734
Update README.md
ChinaLiuHao Apr 12, 2019
7af4cd1
fix readme
luluxing3 Apr 12, 2019
29e112a
fix max_step, update run.sh and run_ernie.sh
chenbjin Apr 14, 2019
69084b8
add finetuned model for lac
Halfish Apr 15, 2019
22e627f
fix bug
Halfish Apr 15, 2019
5097081
Update README.md
Halfish Apr 15, 2019
a1400bf
update
ChinaLiuHao Apr 15, 2019
ebad1a5
Update README.md
zhangyimi Apr 15, 2019
30847e5
add ERNIE pretrained model, and update README
chenbjin Apr 15, 2019
ebfd297
update readme
0YuanZhang0 Apr 16, 2019
b2569d8
add CPU
luluxing3 Apr 16, 2019
cecb2a1
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 16, 2019
38418ef
update infer in run.sh and run_ernie.sh
chenbjin Apr 16, 2019
d483d9f
Update README.md
ChinaLiuHao Apr 16, 2019
b821d2f
Update README.md
ChinaLiuHao Apr 16, 2019
8522379
Delete test.py
phlrain Apr 16, 2019
012ef61
fix bug
Halfish Apr 16, 2019
f66d3ab
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 16, 2019
c3c3343
fix run.sh infer bug & add ernie infer code
Halfish Apr 16, 2019
983f16e
fix cpu mode
0YuanZhang0 Apr 17, 2019
fca3892
Update README.md
phlrain Apr 17, 2019
082f5f7
fix bug for python3
Halfish Apr 17, 2019
a9099f7
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 17, 2019
da5d70a
fix CPU and GPU diff result bug
Halfish Apr 17, 2019
2517416
Update README.md
phlrain Apr 17, 2019
5ebf80d
update readme
0YuanZhang0 Apr 17, 2019
744c282
Update run_classifier.py
ChinaLiuHao Apr 17, 2019
cf230b1
Update README.md
Halfish Apr 17, 2019
30ca2d0
Update README.md
zhangyimi Apr 18, 2019
da68ae1
Update README.md
Halfish Apr 18, 2019
4aa709f
Update README.md
zhangyimi Apr 18, 2019
6f4dbe2
Update README.md
zhangyimi Apr 18, 2019
852a73f
Update run.sh
ChinaLiuHao Apr 18, 2019
0e4adaf
Update run_ernie.sh
ChinaLiuHao Apr 18, 2019
bca29e4
modify dir
0YuanZhang0 Apr 18, 2019
2148a45
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 18, 2019
2fa3a0b
Update README.md
zhangyimi Apr 18, 2019
1182d81
modify dir too
luluxing3 Apr 18, 2019
2a249a7
modify path
luluxing3 Apr 18, 2019
88ec130
Update README.md
ChinaLiuHao Apr 19, 2019
b381ddf
Merge branch 'develop' into paddle-nlp
chenbjin Apr 19, 2019
e29dfeb
PaddleNLP modules backup to old/, rm links-LAC,Senta,SimNet
chenbjin Apr 19, 2019
b3b23bc
mv all modules out of paddle-nlp, rm Senta, auto_dialog_eval, deep_match
chenbjin Apr 19, 2019
803ee68
mv models/classify to models/classification, models/seq_lab to models…
chenbjin Apr 19, 2019
ed829f0
update readme for models/classification
chenbjin Apr 19, 2019
e21a38d
update sentiment_classification and rm README
chenbjin Apr 19, 2019
ca50abd
Add Transformer into paddle-nlp
guoshengCS Apr 19, 2019
56fdfa0
change seq_lab to sequence labeling
Halfish Apr 19, 2019
c1e93b5
Rename old as unarchived in PaddleNLP
guoshengCS Apr 19, 2019
39488d3
Merge pull request #2097 from guoshengCS/paddle-nlp-transformer-new
guoshengCS Apr 19, 2019
e2d9df3
add LARK
Apr 19, 2019
83421c0
Update README, add paddlehub
chenbjin Apr 19, 2019
559572d
add paddlehub
chenbjin Apr 19, 2019
fb12c1a
Add tmp readme
Apr 21, 2019
0c318b0
Update README.md
ChinaLiuHao Apr 22, 2019
3580532
Update README.md
zhangyimi Apr 22, 2019
c24850e
Update README.md
zhangyimi Apr 22, 2019
0c963e9
Update README.md
zhangyimi Apr 22, 2019
bdef090
Update run_ernie.sh
ChinaLiuHao Apr 22, 2019
cb27955
Update run_ernie.sh
ChinaLiuHao Apr 22, 2019
7b53029
Update README.md
zhangyimi Apr 22, 2019
d84cc3c
Update run_ernie_classifier.py
ChinaLiuHao Apr 22, 2019
88b3338
Update README.md
ChinaLiuHao Apr 22, 2019
2a7e141
Update README.md
ChinaLiuHao Apr 22, 2019
cc6f24f
Update run.sh
ChinaLiuHao Apr 22, 2019
d019336
Update run_ernie_classifier.py
ChinaLiuHao Apr 22, 2019
0eaa525
update
Apr 22, 2019
749f6e4
fix chunk_evaluator bug
Halfish Apr 22, 2019
54aa0be
change names
Apr 22, 2019
88a8e02
Update README
Apr 22, 2019
787dd2b
add gitmodules
Apr 22, 2019
df4eafe
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Apr 22, 2019
fbdb7c6
add install code
Apr 22, 2019
54a8b8a
Update README.md
ChinaLiuHao Apr 22, 2019
3085952
Update README.md
zhangyimi Apr 22, 2019
c93c0fb
Update README.md
zhangyimi Apr 22, 2019
ad89c3d
Update README.md
zhangyimi Apr 22, 2019
c5bde10
Update README.md
zhangyimi Apr 22, 2019
396b2e9
Update README.md
zhangyimi Apr 22, 2019
aa6ba97
Update README.md
Apr 22, 2019
4bcb823
Update README.md
ChinaLiuHao Apr 22, 2019
2dc7a42
Update README.md
ChinaLiuHao Apr 22, 2019
149131c
Update READMEs
Apr 22, 2019
728ab29
Update README.md
Halfish Apr 22, 2019
00305da
Update README.md
Halfish Apr 22, 2019
6561d9d
Update README.md
Halfish Apr 22, 2019
6723417
Update README.md
ChinaLiuHao Apr 22, 2019
5a13676
Update README.md
ChinaLiuHao Apr 22, 2019
4ed856d
Update README.md
Halfish Apr 22, 2019
2e00487
Update README.md
Halfish Apr 22, 2019
49d3a17
README
0YuanZhang0 Apr 22, 2019
37f2a13
Update README.md
ChinaLiuHao Apr 22, 2019
6abc646
update emotion_detection README
chenbjin Apr 22, 2019
c027583
Update README.md
ChinaLiuHao Apr 22, 2019
67be0fc
Update README.md
ChinaLiuHao Apr 22, 2019
922a35f
Update README.md
ChinaLiuHao Apr 22, 2019
d1e4e12
Update README.md
zhangyimi Apr 22, 2019
9b69767
Update README.md
ChinaLiuHao Apr 22, 2019
27a65fe
Update README.md
ChinaLiuHao Apr 22, 2019
e471b0e
Update README.md
ChinaLiuHao Apr 22, 2019
8db94fe
Update README.md
ChinaLiuHao Apr 22, 2019
391c555
Update README.md
zhangyimi Apr 22, 2019
cddc96f
update REAME, add finetune doc
chenbjin Apr 22, 2019
9e50d44
update emotion_detection readme
chenbjin Apr 22, 2019
f91724e
change run.sh
zhangyimi Apr 22, 2019
de9cded
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
zhangyimi Apr 22, 2019
bf27ccd
Update README.md
zhangyimi Apr 22, 2019
700c3d4
Update the link in fluid dir
Apr 22, 2019
89afda0
Merge branch 'paddle-nlp' of upstream into paddle-nlp
Apr 22, 2019
4229c33
update readme
0YuanZhang0 Apr 22, 2019
f178186
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 22, 2019
b07f15f
update README for markdown style
chenbjin Apr 22, 2019
d291bcb
Update README.md
ChinaLiuHao Apr 22, 2019
26ab375
Update README.md
Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 116 additions & 3 deletions PaddleNLP/paddle-nlp/auto_dialogue_evaluation/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Auto Dialogue Evaluation

## 简介

## 简介
### 任务说明
对话自动评估(Auto Dialogue Evaluation)评估开放领域对话系统的回复质量,能够帮助企业或个人快速评估对话系统的回复质量,减少人工评估成本。
1. 在无标注数据的情况下,利用负采样训练匹配模型作为评估工具,实现对多个对话系统回复质量排序;
Expand All @@ -11,11 +9,126 @@
我们以四个不同的对话系统(seq2seq\_naive/seq2seq\_att/keywords/human)为例,使用对话自动评估工具进行自动评估。
1. 无标注数据情况下,直接使用预训练好的评估工具进行评估;
在四个对话系统上,自动评估打分和人工评估打分spearman相关系数,如下:

/|seq2seq\_naive|seq2seq\_att|keywords|human
--|:--:|--:|:--:|--:
cor|0.361|0.343|0.324|0.288

对四个系统平均得分排序:

人工评估|k(0.591)<n(0.847)<a(1.116)<h(1.240)
--|--:
自动评估|k(0.625)<n(0.909)<a(1.399)<h(1.683)

2. 利用少量标注数据微调后,自动评估打分和人工打分spearman相关系数,如下:

/|seq2seq\_naive|seq2seq\_att|keywords|human
--|:--:|--:|:--:|--:
cor|0.474|0.477|0.443|0.378

## 快速开始
### 安装说明
1. paddle安装
本项目依赖于 Paddlepaddle Fluid 1.3.1,请参考安装指南进行安装。
2. 安装代码
3. 环境依赖
### 开始第一次模型调用
1. 数据准备
下载经过预处理的数据,运行该脚本之后,data目录下会存在unlabel_data(train.ids/val.ids/test.ids/word2ids),lable_data(四个任务数据train.ids/val.ids/test.ids)
该项目只开源测试集数据,其他数据仅提供样例。
```
sh download_data.sh
```
2. 模型下载
我们开源了基于海量未标注数据训练好的模型,以及基于少量标注数据微调的模型,可供用户直接使用
```
cd model_files
sh download_model.sh
```
3. 模型预测
基于上面的模型和数据,可以运行下面的命令直接对对话数据进行打分。
```
TASK=human
python -u main.py \
--do_infer True \
--use_cuda \
--test_path data/label_data/$TASK/test.ids \
--init_model model_files/${TASK}_finetuned
```
4. 模型评估
基于上面的模型和数据,可以运行下面的命令进行效果评估。
评估预训练模型作为自动评估效果:
```
for task in seq2seq_naive seq2seq_att keywords human
do
echo $task
python -u main.py \
--do_val True \
--use_cuda \
--test_path data/label_data/$task/test.ids \
--init_model model_files/matching_pretrained \
--loss_type L2
done
```
评估微调模型效果:
```
for task in seq2seq_naive seq2seq_att keywords human
do
echo $task
python -u main.py \
--do_val True \
--use_cuda \
--test_path data/label_data/$task/test.ids \
--init_model model_files/${task}_finetuned \
--loss_type L2
done
```
5. 训练与验证
基于示例的数据集,可以运行下面的命令,进行第一阶段训练
```
python -u main.py \
--do_train True \
--use_cuda \
--save_path model_files_tmp/matching_pretrained \
--train_path data/unlabel_data/train.ids \
--val_path data/unlabel_data/val.ids
```
在第一阶段训练基础上,可利用少量标注数据进行第二阶段训练
```
TASK=human
python -u main.py \
--do_train True \
--loss_type L2 \
--use_cuda \
--save_path model_files_tmp/${TASK}_finetuned \
--init_model model_files/matching_pretrained \
--train_path data/label_data/$TASK/train.ids \
--val_path data/label_data/$TASK/val.ids \
--print_step 1 \
--save_step 1 \
--num_scan_data 50
```

## 进阶使用
### 任务定义与建模
对话自动评估任务输入是文本对(上文,回复),输出是回复质量得分。
### 模型原理介绍
匹配任务(预测上下文是否匹配)和自动评估任务有天然的联系,该项目利用匹配任务作为自动评估的预训练;
利用少量标注数据,在匹配模型基础上微调。
### 数据格式说明
训练、预测、评估使用的数据示例如下,数据由三列组成,以制表符('\t')分隔,第一列是以空格分开的上文id,第二列是以空格分开的回复id,第三列是标签
注:本项目额外提供了分词预处理脚本(在preprocess目录下),可供用户使用,具体使用方法如下:
```
python tokenizer.py --test_data_dir ./test.txt.utf8 --batch_size 1 > test.txt.utf8.seg
```
### 代码结构说明
main.py:该项目的主函数,封装包括训练、预测、评估的部分
config.py:定义了该项目模型的相关配置,包括具体模型类别、以及模型的超参数
reader.py:定义了读入数据,加载词典的功能
evaluation.py:定义评估函数
init.py:定义模型load函数
run.sh:训练、预测、评估运行脚本

## 其他
如何贡献代码
如果你可以修复某个issue或者增加一个新功能,欢迎给我们提交PR。如果对应的PR被接受了,我们将根据贡献的质量和难度进行打分(0-5分,越高越好)。如果你累计获得了10分,可以联系我们获得面试机会或者为你写推荐信。
2 changes: 1 addition & 1 deletion PaddleNLP/paddle-nlp/emotion_detection/download_model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# download pretrain model file to ./models/
MODEL_URL=https://baidu-nlp.bj.bcebos.com/emotion_detection_textcnn-1.0.0.tar.gz
wget --no-check-certificate ${DATA_URL}
wget --no-check-certificate ${MODEL_URL}

tar xvf emotion_detection_textcnn-1.0.0.tar.gz
/bin/rm emotion_detection_textcnn-1.0.0.tar.gz