Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help regarding dataset. #3

Open
HarshitSoni1903 opened this issue Feb 28, 2019 · 7 comments
Open

Help regarding dataset. #3

HarshitSoni1903 opened this issue Feb 28, 2019 · 7 comments

Comments

@HarshitSoni1903
Copy link

I would like to know more about how the text data available on NTCIR Short Text Conversation Task(STC-3) Chinese Emotional Conversation Generation (CECG) Subtask (http://coai.cs.tsinghua.edu.cn/hml/challenge/dataset_description/) was processed to the 4 files:
category: target sentence emotion category
choice: target sentence emotional word annotation
source: source sentence
target: target sentence
For which i looked at https://github.com/AaronYALai/Seq2seqAttn_ECM as well.
Hence, more information or guidance would be a great help. since this will help me in processing English dataset as well.
Regards

@1YCxZ
Copy link
Owner

1YCxZ commented Feb 28, 2019

Hi,I will give you a simple explain about this 4 files.

Fist of all, this model is for single turn dialogue, which means we have two sentences here, One is source sentence and the other is target sentence, just like ask and answer.I split the dialogues into 2 files, they are source.txt and target.txt. The i-th line in source.txt is corresponding to the i-th line in target.txt and the same for the category.txt and the choice.txt.

1.category: target sentence emotion category
This file needs an emotion classifier model to label the emotion type of the target sentence, such as happy, angry, sad. Then, I map the emotion type to numbers.

2.choice: target sentence emotional word annotation
This file needs an emotional word dictionary to label the words in a target sentence.If one word is an emotional word we label it as 1 else 0. For example: I am very happy . 0 0 0 1 0

3.source: source sentence
The source sentence in a dialogue.

4.target: target sentence
The target sentence in a dialogue.

@HarshitSoni1903
Copy link
Author

Hey thank you so much for the help, now I'm able to understand the data atleast.
So for the processing of data is there any specific module that you are using? or the https://github.com/AaronYALai/Seq2seqAttn_ECM/tree/master/emotionregressor module is to be tweaked?

@1YCxZ
Copy link
Owner

1YCxZ commented Feb 28, 2019

I haven't use his emotion classifier so I'm not quite sure. However, I think you can try his module, it seems works well.

@HarshitSoni1903
Copy link
Author

HarshitSoni1903 commented Feb 28, 2019

So what module do you suggest me to use? because that module does not create emotional word dictionary or uses it in any way.
I am trying to fit the model on another dataset that has been separated to post.txt response.txt (which have their emotion attributes pre-attached to them in JSON as well as CSV format).

@1YCxZ
Copy link
Owner

1YCxZ commented Feb 28, 2019

OK, what you said means that you don't need a emotion classifier. So, now you just need an English emotional word dictionary to get the file choice.txt. I think you can find one on the Internet, since I get one Chinese emotional word dictionary using Google search.

@HarshitSoni1903
Copy link
Author

Okay thanks, will look for it, and I guess after that it's just searching for a word, if emotion word exists it is 1, else 0.
Which classifier did you use, because my dataset was manually annotated by the providers so i'll keep it, just in case.

@1YCxZ
Copy link
Owner

1YCxZ commented Mar 1, 2019

Sorry, I can't provide you my classifier. You can use this
https://github.com/AaronYALai/Seq2seqAttn_ECM/tree/master/emotionregressor emotion classifier. You can also try to use BERT as a emotion classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants