This is a new challenging and comprehensive Chinese benchmark for multi-domain Goal-oriented Dialog evaluation, which covers three datasets with different knowlwdge soueces: slot-based dialog, Flow-based Dialog and Retrieval-based Dialog.
The datases is in the google drive. Please download the datasets and merge the datasets with the codes in the git by name of the path.
cd slot_based_dialog
The datasets is in ./data
, there are two baselines:
- Chinese gpt, download the model and put it in the dir
cdial_gpt
and go to the path, run therun.sh
to train and test, and useeval.py
to get the evaluation results - Chinese T5, download the model and put it in the dir
chinese_t5
and go to the path, runrun.sh
for train and test, and useeval.py
to get the evaluation results
cd flow_based_dialog
The datasets is in ./data
, there are two baselines:
- Roberta-wwm, download the model
- StructBERT, download the model
use therun.sh
for training (set is_train) or test (set is_eval) and get the json output file, and run theeval.py
for the result
cd retrieval_based_dialog
The datasets is train.json, dev.json, test.json
ues the same two baseline models and codes with Flow-based Dialog
use the run.sh
for training (set is_train) or test (set is_eval) and get the json output file, and run the ECDMetric.py
for the result.
You can cite our paper with the information:
@article{dai2022cgodial,
title={CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog Evaluation},
author={Dai, Yinpei and He, Wanwei and Li, Bowen and Wu, Yuchuan and Cao, Zheng and An, Zhongqi and Sun, Jian and Li, Yongbin},
journal={arXiv preprint arXiv:2211.11617},
year={2022}
}