基于级联双通道分阶段融合的双模态情感识别

Bimodal Emotion Recognition Model Based on Cascaded Two Channel Phased Fusion

下载PDF

导出

摘要为充分提取文本和语音双模态深层情感特征,解决模态间有效交互融合的问题,提高情感识别准确率,提出了基于级联双通道分阶段融合(cascade two channel and phased fusion,CTC-PF)的双模态情感识别模型。设计级联顺序注意力编码器(cascaded sequential attention-Encoder,CSA-Encoder)对长距离语音情感序列信息进行并行化计算,提取深层语音情感特征;提出情感领域级联编码器(affective field cascade-Encoder,AFC-Encoder),提高模型的全局和局部文本理解能力,解决文本关键情感特征稀疏的问题。两个级联通道完成语音和文本信息的特征提取之后,利用协同注意力机制对两者的重要情感特征进行交互融合,降低对齐操作成本,然后采用哈达玛点积对其进行二次融合,捕获差异性特征,分阶段融合实现不同时间步长模态序列间的信息交互,解决双模态情感信息交互不足的问题。模型在IEMOCAP数据集上进行分类实验,结果表明,情感识别准确率可达79.4%,F1值可达79.0%,相比现有主流方法有明显提升,证明了该模型在语音和文本融合情感识别上的优越性。 In order to fully extract the deep emotional features of text and speech and solve the problem of effective interactive fusion between this two modals,a bimodel emotion recognition model based on cascade two channel and phased fusion(CTC-PF)is proposed.First,the cascaded sequential attention-Encoder(CSA-Encoder)is designed to compute the long-distance speech emotion sequence information in parallel and extract the deep-level speech emotion feature.Besides,the affective field cascade-Encoder(AFC-Encoder)is designed to improve the text feature extractor’s global and local text understanding abilities and solve the problem of sparse key emotional features of text.After this two cascaded channels model completing the feature extraction of speech and text information,the collaborative attention mechanism is used to interactively integrate the important emotional features of this two modals,which aim to reduce the cost of alignment operations,and then the Hadamard dot product is designed to perform secondary fusion to capture the difference features and solve the problem of insufficiency of emotional information interaction between this two modals,phased fusion realizes the information interaction between modal sequences of different time steps.The emotion recognition model performs classification experiments on the IEMOCAP dataset.The results show that the accuracy of emotion recognition can reach 79.4%,and the F1-score can reach 79.0%.Compared with the existing mainstream methods,the performance of the proposed model is significantly improved,which proves the proposed fusion model is in a high superiority of speech and text bimodal emotion recognition.

作者徐志京刘霞 XU Zhijing;LIU Xia(College of Information Engineering,Shanghai Maritime University,Shanghai 201306,China)

机构地区上海海事大学信息工程学院

出处《计算机工程与应用》 CSCD 北大核心 2023年第8期127-137,共11页 Computer Engineering and Applications

基金国家重点研发计划(2019YFB1600605) 上海市扬帆计划(20YF1416700)。

关键词双模态情绪识别级联编码器分阶段融合信息交互 bimodal emotion recognition cascaded encoder phased fusion information interaction

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1徐志京,高姗.基于Transformer-ESIM注意力机制的多模态情绪识别[J].计算机工程与应用,2022,58(10):132-138. 被引量：2
2王兰馨,王卫亚,程鑫.结合Bi-LSTM-CNN的语音文本双模态情感识别模型[J].计算机工程与应用,2022,58(4):192-197. 被引量：17

二级参考文献5

1陈鹏展,张欣,徐芳萍.基于语音信号与文本信息的双模态情感识别[J].华东交通大学学报,2017,34(2):100-104. 被引量：8
2饶元,吴连伟,王一鸣,冯聪.基于语义分析的情感计算技术研究进展[J].软件学报,2018,29(8):2397-2426. 被引量：53
3胡婷婷,沈凌洁,冯亚琴,王蔚.语音与文本情感识别中愤怒与开心误判分析[J].计算机技术与发展,2018,28(11):124-127. 被引量：5
4黄鹤,荆晓远,董西伟,吴飞.基于Skip-gram的CNNs文本邮件分类模型[J].计算机技术与发展,2019,29(6):143-147. 被引量：7
5孙晓虎,李洪均.语音情感识别综述[J].计算机工程与应用,2020,56(11):1-9. 被引量：16

共引文献17

1赵小明,杨轶娇,张石清.面向深度学习的多模态情感识别研究进展[J].计算机科学与探索,2022,16(7):1479-1503. 被引量：14
2辛苗苗,马丽,胡博发.跨模态交互信息的双模态情感识别[J].福建电脑,2022,38(11):82-84.
3王希,王君堡,边巴旺堆.基于卷积神经网络的藏语语音情感识别[J].信息技术与信息化,2022(11):202-206. 被引量：2
4左广明.基于虚拟技术和深度学习英语口语训练系统设计[J].自动化与仪器仪表,2022(12):143-147.
5张卫星,程德生,江峰,陈志方,潘倩倩.政府公共服务领域人机交互现状综述[J].信息技术与标准化,2023(6):30-35. 被引量：2
6皮志勇,朱益,廖玄,李振兴,方豪,吴沛.基于深度学习的智能变电站通信链路故障定位方法[J].中国电力,2023,56(7):136-145. 被引量：7
7张威,赵世灵,刘银豪,王鸿奎,殷海兵.多尺度时空特征聚合的全参考视频质量评价[J].计算机工程与应用,2023,59(18):154-162.
8董晓斌,王亮.基于XLNet-CBGRU的双模态音乐情感识别[J].物联网技术,2023,13(10):33-36.
9李泽,孙颖,张雪英,周雅茹.基于元双模态学习模型的情感识别[J].北京邮电大学学报,2023,46(5):87-92.
10吴晓,牟璇,刘银华,刘晓瑞.一种基于语音、文本和表情的多模态情感识别算法[J].西北大学学报（自然科学版）,2024,54(2):177-187. 被引量：2

1李路宝,陈田,任福继,罗蓓蓓.基于图神经网络和注意力的双模态情感识别方法[J].计算机应用,2023,43(3):700-705. 被引量：3
2徐瑞涓,高建瓴.基于BERT和注意力引导图卷积网络的关系抽取[J].智能计算机与应用,2023,13(2):204-209.
3赵爽.合作学习对高中生英语课堂学习焦虑影响的调查研究[J].中文科技期刊数据库（文摘版）教育,2021(8):49-49.
4袁琼芳.基于深度学习的突发公共事件网络舆情情感识别研究[J].电脑知识与技术,2023,19(7):42-44.
5Tanping Fu,Wenli Duan,Mingyan Chen,Shuyang Zhang.Qi Fang:Affective interweave with patients[J].Protein & Cell,2022,13(11):785-789.
6马丽.基于主题意义的初中英语大单元阅读教学策略与方法[J].中文科技期刊数据库（全文版）教育科学,2022(5):67-70.
7林莉莉.新中产青年的重塑:文化创意产业中的劳动与性别[J].中国图书评论,2023(2):37-44. 被引量：1
8杨坤融,熊余,张健,储雯.面向长短期混合数据的MOOC辍学预测策略研究[J].计算机工程与应用,2023,59(4):130-138. 被引量：2
9李经纬,张文娥,张万萍,徐秀红,陈红,沈欣杰,潘学军.高校园艺植物种质资源学课程线上线下教学新格局建设研究[J].智慧农业导刊,2023,3(8):125-128. 被引量：1
10梁科晋,张海军,刘雅情,张昱,王月阳.混合多尺度卷积结合双层LSTM语音情感识别[J].计算机与现代化,2023(1):63-68. 被引量：2

计算机工程与应用

2023年第8期

浏览历史

内容加载中请稍等...

基于级联双通道分阶段融合的双模态情感识别

参考文献2

二级参考文献5

共引文献17

相关作者

相关机构

相关主题

浏览历史