Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations
Fanfan Wang, Heqing Ma, Xiangqing Shen, Jianfei Yu*, Rui Xia*
This repository contains the code for ObG, a multimodal pipeline framework that first generates emotion-cause aware video captions (Observe) and then facilitates the generation of emotion causes (Generate).
Multimodal Emotion Cause Generation in Conversations (MECGC) aims to generate the abstractive causes of given emotions based on multimodal context.
ECGF is constructed by manually annotating the abstractive causes for each emotion labeled in the existing ECF dataset.
conda env create -f environment.yml
conda activate obg
# install nlgeval for evaluation
pip install git+https://github.com/Maluuba/nlg-eval.git
Gemini-Pro-Vision is used to generate emotion-cause aware video captions as supervised data for training ECCap. For the detailed instruction template, please refer to Figure 3 in our paper.
# modify the data_dir, output_dir
bash ECCap.sh
# modify the data_dir, output_dir
bash CGM.sh
@inproceedings{wang2024obg,
title={Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations},
author={Wang, Fanfan and Ma, Heqing and Shen, Xiangqing and Yu, Jianfei and Xia, Rui},
booktitle={Proceedings of the 32st ACM International Conference on Multimedia},
pages={},
year={2024}
}
@ARTICLE{ma2024monica,
author={Ma, Heqing and Yu, Jianfei and Wang, Fanfan and Cao, Hanyu and Xia, Rui},
journal={IEEE Transactions on Affective Computing},
title={From Extraction to Generation: Multimodal Emotion-Cause Pair Generation in Conversations},
year={2024},
volume={},
number={},
pages={},
doi={10.1109/TAFFC.2024.3446646}
}
Our code benefits from VL-T5 and CICERO. We appreciate their valuable contributions.