基于互信息最大化和对比损失的多模态情绪识别模型

Multimodal Emotion Recognition in Conversation with Mutual Information Maximization and Contrastive Loss

下载PDF

导出

摘要多模态的对话情绪识别(Emotion Recognition in Conversation,ERC)是构建情感对话系统的关键。近年来,基于图的融合方法在会话中动态聚合多模态上下文特征,提高了模型在多模态对话情绪识别方面的性能。然而,这些方法都没有充分保留和利用输入数据中的有价值的信息。具体地说,它们都没有保留从输入到融合结果的任务相关信息,并且忽略了标签本身蕴含的信息。为了解决上述问题,该文提出了一种基于互信息最大化和对比损失的多模态对话情绪识别模型(Multimodal ERC with Mutual Information Maximization and Contrastive Loss,MMIC)。模型通过在输入级和融合级上分级最大化模态之间的互信息(Mutual Information),使任务相关信息在融合过程中得以保存,从而生成更丰富的多模态表示。该文还在基于图的动态融合网络中引入了监督对比学习(Supervised Contrastive Learning),通过充分利用标签蕴含的信息,使不同情绪相互排斥,增强了模型识别相似情绪的能力。在两个英文和一个中文的公共数据集上的大量实验证明了该文所提出模型的有效性和优越性。此外,在所提出模型上进行的案例探究有效地证实了模型可以有效保留任务相关信息,更好地区分出相似的情绪。消融实验和可视化结果证明了模型中每个模块的有效性。 Multimodal Emotion recognition in conversation(ERC)is a key component for building emotional dialogue systems.In recent years,graph-based fusion methods have been proposed to dynamically aggregate multimodal context features in conversations,improving the performance of models on Multimodal ERC.However,these methods do not fully preserve and utilize valuable information in the input data.Specifically,they do not retain task-relevant information from input to fusion results,and they ignore the information implied by the labels themselves.In this paper,to overcome the above issues,we propose a new Multimodal ERC model with Mutual Information maximization and Contrastive loss(MMIC).The model maximizes the mutual information between modalities at both the input level and the fusion level,hierarchically,which preserves task-relevant information during the fusion process and generates richer multimodal representations.We also introduce supervised contrastive learning into the graph-based dynamic fusion network,which leverages the information implied by labels to make different emotions repel each other and enhance the model's ability to recognize similar emotions.Extensive experiments on two public benchmark datasets and a new Chinese dataset demonstrate the effectiveness and superiority of our proposed model.In addition,case studies conducted on the proposed model effectively confirmed that the model can effectively retain task-related information and better distinguish similar emotions.Ablation experiments and visualization results demonstrate the effectiveness of each module in the model.

作者黎倩尔黄沛杰陈佳炜吴嘉林徐禹洪林丕源 LI Qian’er;HUANG Peijie;CHEN Jiawei;WU Jialin;XU Yuhong;LIN Piyuan(College of Mathematics and Informatics,South China Agricultural University,Guangzhou,Guangdong 510642,China)

机构地区华南农业大学数学与信息学院

出处《中文信息学报》 CSCD 北大核心 2024年第7期137-146,共10页 Journal of Chinese Information Processing

基金国家自然科学基金(71472068,62306119) 广东省自然科学基金(2021A1515011864) 广州市智慧农业重点实验室(201902010081) 华南农业大学大学生创新训练计划项目(X202210564157)。

关键词多模态对话情绪识别图卷积网络互信息监督对比学习 multimodal emotion recognition in conversation graph convolutional network mutual information supervised contrastive learning

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1马秀丽.项目化学习在小学数学教学中的应用[J].新课程导学,2024(12):70-73.
2林国任.浅析六堡茶高效生态型栽培管理[J].农家科技,2024(14):141-143.
3何明兴.建筑幕墙钢结构施工技术的运用[J].新材料·新装饰,2024,6(18):135-138.
4零芷婕.城市记忆背景下地方历史文化资源的整合与利用[J].中文科技期刊数据库（文摘版）社会科学,2024(6):0124-0127.
5熊文俊.STEAM教育理念下的初中英语项目式学习案例探究——以“We're trying to save the earth!”为例[J].中学教学参考,2024(22):26-29.
6时明霞.高中数学“微项目”学习的案例探究[J].中学数学,2024(17):40-41.
7Ziwang FU,Feng LIU,Qing XU,Xiangling FU,Jiayin QI.LMR-CBT: learning modality-fused representations with CB-Transformer for multimodal emotion recognition from unaligned multimodal sequences[J].Frontiers of Computer Science,2024,18(4):39-47.
8Feng Liu,Ziwang Fu,Yunlong Wang,Qijian Zheng.TACFN:Transformer-Based Adaptive Cross-Modal Fusion Network for Multimodal Emotion Recognition[J].CAAI Artificial Intelligence Research,2023,2(1):75-82.
9赵家宝.新课程理念下高中物理高效课堂构建路径探究[J].教育信息化论坛,2024(12):87-89.
10王涵.数字时代下图书馆与信息文化的融合[J].文化产业,2024(11):37-39.

中文信息学报

2024年第7期

浏览历史

内容加载中请稍等...

基于互信息最大化和对比损失的多模态情绪识别模型

相关作者

相关机构

相关主题

浏览历史