Open
Description
It's a great job. But I have a question, why do you use DualCL to perform out-of-order operation specifically for labels? This operation will not change the real label in binary classification, but it will change the real label in multi-classification. I don't understand the significance of this.
In fact, I followed this setup and then trained it on my own dataset, a binery classification task like dialogue intention recognition, and trained it for 30 epochs using Roberta, with very poor results, isn't DualCl suitable for this kind of task? I hope you can help me to point out my misunderstanding.
Metadata
Assignees
Labels
No labels
Activity
hiyouga commentedon Sep 13, 2022
Thanks for your question. We perform random shuffling on the labels to mitigate the bias brought by the position embeddings in the BERT models. In other words, we make the label representations irrelevant to the orders. This operation will change the real label neither in binary classification nor in multi-class classification.
The DualCL performs well on the sentiment classification and the question classification tasks, but the performance on the other tasks is yet to be confirmed. The unsatisfied results on the dialogue intention recognition task may be due to the nature of such a complicated task.
OPilgrim commentedon Sep 13, 2022
Thanks for your reply!
Well, I must point out that multiple classification may indeed be problematic...
Here's how you do it:
Assuming we have
6
categories, thenself._label_list
assumes this:["1", "2", "3", "4", "5", "6"]
.Rand_idx
should be[0,1, 2, 3, 4, 5]
, assuming shuffle is followed by[3, 5, 0,1, 4, 2]
, thenlabel_list
should be["4", "6", "1", "2", "5", "3"]
. Becauselabel_id = rand_idx[label_id]
, if the originallabel_id
is0
, the correspondinglabel
is"1"
, and the currentlabel_id
is changed torand_idx[0]=3
, the correspondinglabel
is"2"
......Doesn't that make the label wrong?hiyouga commentedon Sep 16, 2022
Thanks very much! Exactly it was problematic, we have removed the random shuffling and assigned all the position embeddings of the label tokens as zero. Therefore, the model's prediction is independent of the label order. The implementation has been updated.