Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init paddle-nlp #2112

Merged
merged 211 commits into from
Apr 22, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
211 commits
Select commit Hold shift + click to select a range
a55440a
init paddle-nlp tools for QA test
chenbjin Apr 5, 2019
c4fc5ea
Fix paragraph extraction bug
Apr 8, 2019
76636fc
Update download links
Apr 8, 2019
13ffb7b
first update LAC README.md
Halfish Apr 8, 2019
d8a536c
rename EmoTect as emotion_detection
chenbjin Apr 8, 2019
797b0d2
download data from bos
Halfish Apr 8, 2019
8cfae7f
Update README.md
Halfish Apr 8, 2019
34c3cc7
Rename project
Apr 8, 2019
b5afc34
second add code
zhangyimi Apr 8, 2019
4df27c1
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 8, 2019
742d7b9
modify downloads.sh for lac
Halfish Apr 8, 2019
c51ed7e
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 8, 2019
c6dcf81
rename LAC to lexical_analysis
Halfish Apr 8, 2019
4a3e311
update lac readme
Halfish Apr 8, 2019
b0ec7ab
Update README.md
zhangyimi Apr 8, 2019
5a55634
Update README.md
zhangyimi Apr 8, 2019
db027f9
Update README.md
zhangyimi Apr 8, 2019
068b6c8
add struct.jpg
zhangyimi Apr 8, 2019
8276d8d
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 8, 2019
97b3b92
Update README.md
zhangyimi Apr 8, 2019
7b687e0
Update README.md
zhangyimi Apr 8, 2019
c6bc41e
update README
chenbjin Apr 8, 2019
bbef065
Update README.md
zhangyimi Apr 8, 2019
c779830
update emotion_detection README
chenbjin Apr 8, 2019
6c131a3
add download_data.sh and download_model.sh
chenbjin Apr 8, 2019
06ef80e
first commit ADE
luluxing3 Apr 9, 2019
0c16b2f
dialogue_model_toolkit_update
0YuanZhang0 Apr 9, 2019
c3db25b
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 9, 2019
4958bc4
update emotion_detection model bos url
chenbjin Apr 9, 2019
cb059cd
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
chenbjin Apr 9, 2019
979c307
update README
luluxing3 Apr 9, 2019
47e175e
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 9, 2019
9c21762
update readme
0YuanZhang0 Apr 9, 2019
7100e3d
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 9, 2019
87a2068
update readme
0YuanZhang0 Apr 9, 2019
7f336d5
update download file
0YuanZhang0 Apr 9, 2019
35e6fff
first commit DAM
luluxing3 Apr 10, 2019
2038575
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 10, 2019
b085ea7
add readme
luluxing3 Apr 10, 2019
cf1d166
fix readme
luluxing3 Apr 10, 2019
dea9ea5
fix readme
luluxing3 Apr 10, 2019
6895fe0
fix readme
luluxing3 Apr 10, 2019
342dc37
fix readme
luluxing3 Apr 10, 2019
8e48e42
fix readme
luluxing3 Apr 10, 2019
d50c211
rename
luluxing3 Apr 10, 2019
2b36c9e
rename again
luluxing3 Apr 10, 2019
6751e15
1. add gradient_clip for ernie_lac
Apr 10, 2019
5b1a309
fix download.sh
zhangyimi Apr 10, 2019
db451cb
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 10, 2019
44c1c13
Rename MRC task
Apr 10, 2019
4458d12
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Apr 10, 2019
99d8362
fix logger
zhangyimi Apr 10, 2019
e37ff4c
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
zhangyimi Apr 10, 2019
8ba34f2
fix to douban
luluxing3 Apr 11, 2019
0519e9a
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 11, 2019
d9de355
fix final
luluxing3 Apr 11, 2019
51b1b56
update readme
luluxing3 Apr 11, 2019
cc4544c
update readme
luluxing3 Apr 11, 2019
b41076e
update readme
luluxing3 Apr 11, 2019
e38a9c7
fix batch is null
luluxing3 Apr 11, 2019
9ee6b63
fix typo
luluxing3 Apr 11, 2019
11fd719
fix typo
luluxing3 Apr 11, 2019
5f43165
fix typo
luluxing3 Apr 11, 2019
bb0d090
update ernie config
chenbjin Apr 11, 2019
f7a6ba7
update readme
chenbjin Apr 11, 2019
0436dd3
add AI platform url in readme
chenbjin Apr 11, 2019
a5809d0
update readme subtitlestyle
chenbjin Apr 11, 2019
beae8c6
update
ChinaLiuHao Apr 11, 2019
670b513
Update README.md
ChinaLiuHao Apr 11, 2019
6254425
Update README.md
ChinaLiuHao Apr 11, 2019
72a5e5c
update
ChinaLiuHao Apr 11, 2019
5d902f2
Create README.md
ChinaLiuHao Apr 11, 2019
10fafd3
Update README.md
ChinaLiuHao Apr 11, 2019
8adfb4e
Update README.md
ChinaLiuHao Apr 11, 2019
23cce9a
Update README.md
ChinaLiuHao Apr 11, 2019
15acc84
Update README.md
ChinaLiuHao Apr 11, 2019
0552725
Update README.md
ChinaLiuHao Apr 11, 2019
20bcc96
Update README.md
ChinaLiuHao Apr 11, 2019
15ee96f
Update README.md
ChinaLiuHao Apr 11, 2019
5661bed
Update README.md
ChinaLiuHao Apr 11, 2019
71ec1c8
Update README.md
ChinaLiuHao Apr 11, 2019
40c5902
Update README.md
ChinaLiuHao Apr 11, 2019
1c9827e
Update README.md
ChinaLiuHao Apr 11, 2019
7366794
Update README.md
ChinaLiuHao Apr 11, 2019
0aecd5d
Update README.md
ChinaLiuHao Apr 11, 2019
4511504
Update README.md
ChinaLiuHao Apr 11, 2019
5966ba6
Update README.md
ChinaLiuHao Apr 11, 2019
9480e9c
update batch size
luluxing3 Apr 11, 2019
dcb334e
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 11, 2019
e6c39f2
adapt to samll data size
luluxing3 Apr 11, 2019
e22a6f2
update ERNIE bcebos url
chenbjin Apr 12, 2019
32bbb7b
add language model
Aurelius84 Apr 12, 2019
16e7e6a
Merge pull request #2035 from Aurelius84/paddle-nlp
phlrain Apr 12, 2019
d87cdec
modify readme
0YuanZhang0 Apr 12, 2019
38f8581
update
ChinaLiuHao Apr 12, 2019
5ee4b0f
update
ChinaLiuHao Apr 12, 2019
b695f7c
Update README.md
ChinaLiuHao Apr 12, 2019
2172734
Update README.md
ChinaLiuHao Apr 12, 2019
7af4cd1
fix readme
luluxing3 Apr 12, 2019
29e112a
fix max_step, update run.sh and run_ernie.sh
chenbjin Apr 14, 2019
69084b8
add finetuned model for lac
Halfish Apr 15, 2019
22e627f
fix bug
Halfish Apr 15, 2019
5097081
Update README.md
Halfish Apr 15, 2019
a1400bf
update
ChinaLiuHao Apr 15, 2019
ebad1a5
Update README.md
zhangyimi Apr 15, 2019
30847e5
add ERNIE pretrained model, and update README
chenbjin Apr 15, 2019
ebfd297
update readme
0YuanZhang0 Apr 16, 2019
b2569d8
add CPU
luluxing3 Apr 16, 2019
cecb2a1
Merge branch 'paddle-nlp' of github.com:PaddlePaddle/models into padd…
luluxing3 Apr 16, 2019
38418ef
update infer in run.sh and run_ernie.sh
chenbjin Apr 16, 2019
d483d9f
Update README.md
ChinaLiuHao Apr 16, 2019
b821d2f
Update README.md
ChinaLiuHao Apr 16, 2019
8522379
Delete test.py
phlrain Apr 16, 2019
012ef61
fix bug
Halfish Apr 16, 2019
f66d3ab
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 16, 2019
c3c3343
fix run.sh infer bug & add ernie infer code
Halfish Apr 16, 2019
983f16e
fix cpu mode
0YuanZhang0 Apr 17, 2019
fca3892
Update README.md
phlrain Apr 17, 2019
082f5f7
fix bug for python3
Halfish Apr 17, 2019
a9099f7
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Halfish Apr 17, 2019
da5d70a
fix CPU and GPU diff result bug
Halfish Apr 17, 2019
2517416
Update README.md
phlrain Apr 17, 2019
5ebf80d
update readme
0YuanZhang0 Apr 17, 2019
744c282
Update run_classifier.py
ChinaLiuHao Apr 17, 2019
cf230b1
Update README.md
Halfish Apr 17, 2019
30ca2d0
Update README.md
zhangyimi Apr 18, 2019
da68ae1
Update README.md
Halfish Apr 18, 2019
4aa709f
Update README.md
zhangyimi Apr 18, 2019
6f4dbe2
Update README.md
zhangyimi Apr 18, 2019
852a73f
Update run.sh
ChinaLiuHao Apr 18, 2019
0e4adaf
Update run_ernie.sh
ChinaLiuHao Apr 18, 2019
bca29e4
modify dir
0YuanZhang0 Apr 18, 2019
2148a45
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 18, 2019
2fa3a0b
Update README.md
zhangyimi Apr 18, 2019
1182d81
modify dir too
luluxing3 Apr 18, 2019
2a249a7
modify path
luluxing3 Apr 18, 2019
88ec130
Update README.md
ChinaLiuHao Apr 19, 2019
b381ddf
Merge branch 'develop' into paddle-nlp
chenbjin Apr 19, 2019
e29dfeb
PaddleNLP modules backup to old/, rm links-LAC,Senta,SimNet
chenbjin Apr 19, 2019
b3b23bc
mv all modules out of paddle-nlp, rm Senta, auto_dialog_eval, deep_match
chenbjin Apr 19, 2019
803ee68
mv models/classify to models/classification, models/seq_lab to models…
chenbjin Apr 19, 2019
ed829f0
update readme for models/classification
chenbjin Apr 19, 2019
e21a38d
update sentiment_classification and rm README
chenbjin Apr 19, 2019
ca50abd
Add Transformer into paddle-nlp
guoshengCS Apr 19, 2019
56fdfa0
change seq_lab to sequence labeling
Halfish Apr 19, 2019
c1e93b5
Rename old as unarchived in PaddleNLP
guoshengCS Apr 19, 2019
39488d3
Merge pull request #2097 from guoshengCS/paddle-nlp-transformer-new
guoshengCS Apr 19, 2019
e2d9df3
add LARK
Apr 19, 2019
83421c0
Update README, add paddlehub
chenbjin Apr 19, 2019
559572d
add paddlehub
chenbjin Apr 19, 2019
fb12c1a
Add tmp readme
Apr 21, 2019
0c318b0
Update README.md
ChinaLiuHao Apr 22, 2019
3580532
Update README.md
zhangyimi Apr 22, 2019
c24850e
Update README.md
zhangyimi Apr 22, 2019
0c963e9
Update README.md
zhangyimi Apr 22, 2019
bdef090
Update run_ernie.sh
ChinaLiuHao Apr 22, 2019
cb27955
Update run_ernie.sh
ChinaLiuHao Apr 22, 2019
7b53029
Update README.md
zhangyimi Apr 22, 2019
d84cc3c
Update run_ernie_classifier.py
ChinaLiuHao Apr 22, 2019
88b3338
Update README.md
ChinaLiuHao Apr 22, 2019
2a7e141
Update README.md
ChinaLiuHao Apr 22, 2019
cc6f24f
Update run.sh
ChinaLiuHao Apr 22, 2019
d019336
Update run_ernie_classifier.py
ChinaLiuHao Apr 22, 2019
0eaa525
update
Apr 22, 2019
749f6e4
fix chunk_evaluator bug
Halfish Apr 22, 2019
54aa0be
change names
Apr 22, 2019
88a8e02
Update README
Apr 22, 2019
787dd2b
add gitmodules
Apr 22, 2019
df4eafe
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
Apr 22, 2019
fbdb7c6
add install code
Apr 22, 2019
54a8b8a
Update README.md
ChinaLiuHao Apr 22, 2019
3085952
Update README.md
zhangyimi Apr 22, 2019
c93c0fb
Update README.md
zhangyimi Apr 22, 2019
ad89c3d
Update README.md
zhangyimi Apr 22, 2019
c5bde10
Update README.md
zhangyimi Apr 22, 2019
396b2e9
Update README.md
zhangyimi Apr 22, 2019
aa6ba97
Update README.md
Apr 22, 2019
4bcb823
Update README.md
ChinaLiuHao Apr 22, 2019
2dc7a42
Update README.md
ChinaLiuHao Apr 22, 2019
149131c
Update READMEs
Apr 22, 2019
728ab29
Update README.md
Halfish Apr 22, 2019
00305da
Update README.md
Halfish Apr 22, 2019
6561d9d
Update README.md
Halfish Apr 22, 2019
6723417
Update README.md
ChinaLiuHao Apr 22, 2019
5a13676
Update README.md
ChinaLiuHao Apr 22, 2019
4ed856d
Update README.md
Halfish Apr 22, 2019
2e00487
Update README.md
Halfish Apr 22, 2019
49d3a17
README
0YuanZhang0 Apr 22, 2019
37f2a13
Update README.md
ChinaLiuHao Apr 22, 2019
6abc646
update emotion_detection README
chenbjin Apr 22, 2019
c027583
Update README.md
ChinaLiuHao Apr 22, 2019
67be0fc
Update README.md
ChinaLiuHao Apr 22, 2019
922a35f
Update README.md
ChinaLiuHao Apr 22, 2019
d1e4e12
Update README.md
zhangyimi Apr 22, 2019
9b69767
Update README.md
ChinaLiuHao Apr 22, 2019
27a65fe
Update README.md
ChinaLiuHao Apr 22, 2019
e471b0e
Update README.md
ChinaLiuHao Apr 22, 2019
8db94fe
Update README.md
ChinaLiuHao Apr 22, 2019
391c555
Update README.md
zhangyimi Apr 22, 2019
cddc96f
update REAME, add finetune doc
chenbjin Apr 22, 2019
9e50d44
update emotion_detection readme
chenbjin Apr 22, 2019
f91724e
change run.sh
zhangyimi Apr 22, 2019
de9cded
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
zhangyimi Apr 22, 2019
bf27ccd
Update README.md
zhangyimi Apr 22, 2019
700c3d4
Update the link in fluid dir
Apr 22, 2019
89afda0
Merge branch 'paddle-nlp' of upstream into paddle-nlp
Apr 22, 2019
4229c33
update readme
0YuanZhang0 Apr 22, 2019
f178186
Merge branch 'paddle-nlp' of https://github.com/PaddlePaddle/models i…
0YuanZhang0 Apr 22, 2019
b07f15f
update README for markdown style
chenbjin Apr 22, 2019
d291bcb
Update README.md
ChinaLiuHao Apr 22, 2019
26ab375
Update README.md
Apr 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 87 additions & 33 deletions PaddleNLP/paddle-nlp/dialogue_model_toolkit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,28 @@
- [二、快速开始](#二、快速开始)
- [三、进阶使用](#三、进阶使用)
- [四、其他](#四、其他)

## 一、简介
###任务说明

### 任务说明

    对话相关的任务中,Dialogue System常常需要根据场景的变化去解决多种多样的任务。任务的多样性(意图识别、槽位解析、DA识别、DST等等),以及领域训练数据的稀少,给Dialogue System的研究和应用带来了巨大的困难和挑战,要使得dialogue system得到更好的发展,需要开发一个通用的对话理解模型。为此,我们给出了基于BERT的对话模型工具箱(DMTK:DialogueModelToolKit),通过实验表明,使用base-model(BERT)并结合常见的学习范式,就可以在几乎全部对话理解任务上取得比肩甚至超越各个领域业内最好的模型的效果,展现了学习一个通用对话理解模型的巨大潜力。

###效果说明
### 效果说明

    a、效果上,我们基于对话相关的业内公开数据集进行评测,效果如下表所示:
|task_name | udc(R1@10)|udc(R2@10)|udc(R5@10)|atis_slot(F1)|dstc2(JOINT ACC)|atis_intent(acc)|swda(acc)|mrda(acc)|
|- |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|对话任务|匹配|匹配|匹配|槽位解析|DST|意图识别|DA|DA|
|任务类型|分类|分类|分类|序列标注|多标签分类|分类|分类|分类|
|任务名称|udc|udc|udc|atis_slot|dstc2|atis_intent|swda|mrda|
|评估指标|R1@10|R2@10|R5@10|F1|JOINT ACC|ACC|ACC|ACC|
|SOTA | 76.70%|87.40%|96.90%|96.89%|74.50%|98.32%|81.30%|91.70%|
|DMTK | 82.02%|90.43%|97.75%|97.10%|89.57%|97.65%|80.19%|91.43%|

| task_name | udc | udc | udc | atis_slot | dstc2 | atis_intent | swda | mrda |
| :------ | :------ | :------ | :------ | :------| :------ | :------ | :------ | :------ |
| 对话任务 | 匹配 | 匹配 | 匹配 | 槽位解析 | DST | 意图识别 | DA | DA |
| 任务类型 | 分类 | 分类 | 分类 | 序列标注 | 多标签分类 | 分类 | 分类 | 分类 |
| 任务名称 | udc | udc | udc| atis_slot | dstc2 | atis_intent | swda | mrda |
| 评估指标 | R1@10 | R2@10 | R5@10 | F1 | JOINT ACC | ACC | ACC | ACC |
| SOTA | 76.70% | 87.40% | 96.90% | 96.89% | 74.50% | 98.32% | 81.30% | 91.70% |
| DMTK | 82.02% | 90.43% | 97.75% | 97.10% | 89.57% | 97.65% | 80.19% | 91.43% |

    b、数据集说明:

```
UDC: Ubuntu Corpus V1;
ATIS: 微软提供的公开数据集DSTC2,Airline Travel Information System;
Expand All @@ -27,44 +33,64 @@ MRDA: Meeting Recorder Dialogue Act;
SWDA:Switchboard Dialogue Act Corpus;
```

##二、快速开始
###1、安装说明
## 二、快速开始

### 1、安装说明

####   a、paddle安装

####  a、paddle安装
    本项目依赖于Paddle Fluid 1.3,请参考安装指南进行安装
####  b、安装代码
####  c、环境依赖
###2、开始第一次模型调用
####  a、数据准备(数据、模型下载,预处理)

####   b、安装代码

####   c、环境依赖

### 2、开始第一次模型调用

####   a、数据准备(数据、模型下载,预处理)

    i、数据下载

```
sh download_data.sh
```

    ii、(非必需)下载的数据集中已提供了训练集,测试集和验证集,用户如果需要重新生成某数据集的训练数据,可执行:

```
cd dialogue_model_toolkit/scripts && sh run_build_data.sh task_name
parameters:
task_name: udc, swda, mrda, atis, dstc2
```
####  b、模型下载

####   b、模型下载

    该项目中,我们基于BERT开发了相关的对话模型,对话模型训练时需要依赖BERT的模型做fine-tuning, 且提供了目前公开数据集上训练好的多个对话模型。

    i、BERT pretrain模型下载:

```
sh download_pretrain_model.sh
```

    ii、dialogue_model_toolkit模块内对话相关模型下载:

```
sh download_models.sh
```
####  c、训练

####   c、训练

    方式一(推荐):

```
sh run_train.sh task_name
parameters:
task_name: udc, swda, mrda, atis_intent, atis_slot, dstc2
```

    方式二:

```
python -u train.py --task_name mrda \ # name model to use. [udc|swda|mrda|atis_intent|atis_slot|dstc2]

Expand All @@ -88,15 +114,19 @@ python -u train.py --task_name mrda \ # name model to use. [udc|swda|mrda|atis_i
--num_iteration_per_drop_scope 10 \ # The iteration intervals to clean up temporary variables.
--use_fp16 false # If set, use fp16 for training.
```
####  d、预测 (推荐e的方式来进行预测评估)

####   d、预测 (推荐e的方式来进行预测评估)

    方式一(推荐):

```
sh run_predict.sh task_name
parameters:
task_name: udc, swda, mrda, atis_intent, atis_slot, dstc2
```

    方式二:

```
python -u predict.py --task_name mrda \ # name model to use. [udc|swda|mrda|atis_intent|atis_slot|dstc2]
--use_cuda true \ # If set, use GPU for training.
Expand All @@ -107,30 +137,43 @@ python -u predict.py --task_name mrda \ # name model to use. [udc|swda|mrda
--max_seq_len 128 \ # Number of words of the longest seqence.
--bert_config_path ./uncased_L-12_H-768_A-12/bert_config.json # Path to the json file for bert model config.
```
####  e、预测+评估(推荐)

####   e、预测+评估(推荐)

    dialogue_model_toolkit模块内提供已训练好的对话模型,可通过sh download_models.sh下载,用户如果不训练模型的时候,可使用提供模型进行预测评估:

```
sh run_eval_metrics.sh task_name
parameters:
task_name: udc, swda, mrda, atis_intent, atis_slot, dstc2
```
##三、进阶使用
###1、任务定义与建模

## 三、进阶使用

### 1、任务定义与建模

    dialogue_model_toolkit模块,针对数据集开发了相关的模型训练过程,支持分类,多标签分类,序列标注等任务,用户可针对自己的数据集,进行相关的模型定制;
###2、模型原理介绍

### 2、模型原理介绍

    本项目针对对话理解相关的问题,底层基于BERT,上层定义范式(分类,多标签分类,序列标注), 开源了一系列公开数据集相关的模型,供用户可配置地使用:
###3、数据格式说明

### 3、数据格式说明

    训练、预测、评估使用的数据可以由用户根据实际的对话应用场景,自己组织数据。输入网络的数据格式统一为,示例如下:

```
[CLS] token11 token12 token13 [INNER_SEP] token11 token12 token13 [SEP] token21 token22 token23 [SEP] token31 token32 token33 [SEP]
```

    输入数据以[CLS]开始,[SEP]分割内容为对话内容相关三部分,如上文,当前句,下文等,如[SEP]分割的每部分内部由多轮组成的话,使用[INNER_SEP]进行分割;第二部分和第三部分部分皆可缺省;

    目前dialogue_model_toolkit模块内已将数据准备部分集成到代码内,用户可根据上面输入数据格式,组装自己的数据;
###4、代码结构说明
### 4、代码结构说明

```
.
├── run_train.sh # 训练执行脚本
├── run_train.sh # 训练执行脚本
├── run_predict.sh # 预测执行脚本
├── run_eval_metrics.sh # 评估执行脚本
├── download_data.sh # 下载数据脚本
Expand Down Expand Up @@ -161,26 +204,37 @@ task_name: udc, swda, mrda, atis_intent, atis_slot, dstc2
├── define_paradigm.py # 上层网络范式
└── create_model.py # 创建底层bert模型+上层网络范式网络结构
```
###5、如何组建自己的模型

### 5、如何组建自己的模型

    用户可以根据自己的需求,组建自定义的模型,具体方法如下所示:

    i、自定义数据
    i、自定义数据

      如用户目前有数据集为**task_name**, 则在**data**下定义**task_name**文件夹,将数据集存放进去;在**reader/data_reader.py**中,新增自定义的数据处理的类,如**udc**数据集对应**UDCProcessor**; 在**train.py**内设置**task_name**和**processor**的对应关系(如**processors = {'udc': reader.UDCProcessor}**),以及当前的数据集训练时是否是否使用**in_tokens**的方式计算batch大小(如:**in_tokens = {'udc': True}**)

    ii、 自定义上层网络范式

      如果用户自定义模型属于分类、多分类和序列标注这3种类型其中一个,则只需要在**paddle-nlp/models/dialogue_model_toolkit/define_paradigm.py** 内指明**task_name**和相应上层范式函数的对应关系即可,如用户自定义模型属于其他模型,则需要自定义上层范式函数并指明其与**task_name**之间的关系;

    iii、自定义预测封装接口

      用户可在define_predict_pack.py内定义task_name和自定义封装预测接口的对应关系;
###6、如何训练
    i、按照上文所述的数据组织形式,组织自己的训练、评估、预测数据;

### 6、如何训练

    i、按照上文所述的数据组织形式,组织自己的训练、评估、预测数据

    ii、运行训练脚本

```
sh run_train.sh task_name
parameters:
task_name: 用户自定义名称
```
##四、其他
###如何贡献代码

## 四、其他

### 如何贡献代码

    如果你可以修复某个issue或者增加一个新功能,欢迎给我们提交PR。如果对应的PR被接受了,我们将根据贡献的质量和难度进行打分(0-5分,越高越好)。如果你累计获得了10分,可以联系我们获得面试机会或者为你写推荐信。
6 changes: 3 additions & 3 deletions PaddleNLP/paddle-nlp/dialogue_model_toolkit/download_data.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/dgu_1.0.0.tar.gz
tar -xvf dgu_1.0.0.tar.gz
rm dgu_1.0.0.tar.gz
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz
tar -xvf dmtk_data_1.0.0.tar.gz
rm dmtk_data_1.0.0.tar.gz
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/dgu_models_1.0.0.tar.gz
tar -xvf dgu_models_1.0.0.tar.gz
rm dgu_models_1.0.0.tar.gz
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/dmtk_models_1.0.0.tar.gz
tar -xvf dmtk_models_1.0.0.tar.gz
rm dmtk_models_1.0.0.tar.gz