dialogue-cse

DialogueCSE

Code for the paper DialogueCSE: Dialogue-based Contrastive Learning of Sentence Embeddings

Training & Evaluation

Steps to train DialogueCSE:

Put dialogues in data/session.txt. The format should be {session_id}\t{role}\t{text}\n.
Run python data/data_generator.py to generate the training data.
Run sh run_train.sh to train DialogueCSE.
Run sh eval/batch_test_cmd.sh {ecd|jddc} to evaluate the model.

The argument bert_init_dir in run_train.sh refers to a pre-trained BERT model, the parameters of which could be either the original version or that with continued pre-training on the training data of DialogueCSE. The latter produces the best performance.

To conduct continued pre-training, please refer to the standard BERT codebase.

Datasets

Download evaluation data from Google Drive and move them to dialogue-cse/dataset

JDDC: JDDC is an open-source dataset released by JD AI [1].
ECD: ECD is released in [2] . We have been granted for releasing our evaluation data which is derived from the original ECD dataset.
MDC: The license of MDC does not support secondary distribution and we are in communication with relevant parties.

[1] https://jddc.jd.com/2019/jddc

[2] Modeling Multi-turn Conversation with Deep Utterance Aggregation.

Name		Name	Last commit message	Last commit date
parent directory ..
common		common
dataset		dataset
eval		eval
model		model
util		util
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.py		config.py
config_bert.py		config_bert.py
requirements.txt		requirements.txt
run_train.sh		run_train.sh
wrapper.py		wrapper.py
wrapper_bert.py		wrapper_bert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dialogue-cse

dialogue-cse

README.md

DialogueCSE

Training & Evaluation

Datasets

Files

dialogue-cse

Directory actions

More options

Directory actions

More options

Latest commit

History

dialogue-cse

Folders and files

parent directory

README.md

DialogueCSE

Training & Evaluation

Datasets