期刊文献+

基于互信息最大化和对比损失的多模态情绪识别模型

Multimodal Emotion Recognition in Conversation with Mutual Information Maximization and Contrastive Loss
下载PDF
导出
摘要 多模态的对话情绪识别(Emotion Recognition in Conversation,ERC)是构建情感对话系统的关键。近年来,基于图的融合方法在会话中动态聚合多模态上下文特征,提高了模型在多模态对话情绪识别方面的性能。然而,这些方法都没有充分保留和利用输入数据中的有价值的信息。具体地说,它们都没有保留从输入到融合结果的任务相关信息,并且忽略了标签本身蕴含的信息。为了解决上述问题,该文提出了一种基于互信息最大化和对比损失的多模态对话情绪识别模型(Multimodal ERC with Mutual Information Maximization and Contrastive Loss,MMIC)。模型通过在输入级和融合级上分级最大化模态之间的互信息(Mutual Information),使任务相关信息在融合过程中得以保存,从而生成更丰富的多模态表示。该文还在基于图的动态融合网络中引入了监督对比学习(Supervised Contrastive Learning),通过充分利用标签蕴含的信息,使不同情绪相互排斥,增强了模型识别相似情绪的能力。在两个英文和一个中文的公共数据集上的大量实验证明了该文所提出模型的有效性和优越性。此外,在所提出模型上进行的案例探究有效地证实了模型可以有效保留任务相关信息,更好地区分出相似的情绪。消融实验和可视化结果证明了模型中每个模块的有效性。 Multimodal Emotion recognition in conversation(ERC)is a key component for building emotional dialogue systems.In recent years,graph-based fusion methods have been proposed to dynamically aggregate multimodal context features in conversations,improving the performance of models on Multimodal ERC.However,these methods do not fully preserve and utilize valuable information in the input data.Specifically,they do not retain task-relevant information from input to fusion results,and they ignore the information implied by the labels themselves.In this paper,to overcome the above issues,we propose a new Multimodal ERC model with Mutual Information maximization and Contrastive loss(MMIC).The model maximizes the mutual information between modalities at both the input level and the fusion level,hierarchically,which preserves task-relevant information during the fusion process and generates richer multimodal representations.We also introduce supervised contrastive learning into the graph-based dynamic fusion network,which leverages the information implied by labels to make different emotions repel each other and enhance the model's ability to recognize similar emotions.Extensive experiments on two public benchmark datasets and a new Chinese dataset demonstrate the effectiveness and superiority of our proposed model.In addition,case studies conducted on the proposed model effectively confirmed that the model can effectively retain task-related information and better distinguish similar emotions.Ablation experiments and visualization results demonstrate the effectiveness of each module in the model.
作者 黎倩尔 黄沛杰 陈佳炜 吴嘉林 徐禹洪 林丕源 LI Qian’er;HUANG Peijie;CHEN Jiawei;WU Jialin;XU Yuhong;LIN Piyuan(College of Mathematics and Informatics,South China Agricultural University,Guangzhou,Guangdong 510642,China)
出处 《中文信息学报》 CSCD 北大核心 2024年第7期137-146,共10页 Journal of Chinese Information Processing
基金 国家自然科学基金(71472068,62306119) 广东省自然科学基金(2021A1515011864) 广州市智慧农业重点实验室(201902010081) 华南农业大学大学生创新训练计划项目(X202210564157)。
关键词 多模态对话情绪识别 图卷积网络 互信息 监督对比学习 multimodal emotion recognition in conversation graph convolutional network mutual information supervised contrastive learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部