摘要
针对情绪识别中信息不全面、易受噪声干扰等问题,基于Transformer网络构建了一种融合文本、视觉和听觉等信息的多模态情感识别网络模型(Bidirectional Encoder Representations from Transformers and Residual Neural Network and Connectionist Temporal Classification and Transformer, BRCTN)。引入人物特征信息辅助情绪识别,提高模型提取关键特征的能力;将单模态情绪识别的输出向量通过模态对齐重组为统一格式;将3个模态和人物特征映射到高维度全局向量空间,学习不同模态特征之间的潜在联系。该模型在IEMOCAP数据集上进行验证,结果表明,与其他方法相比,BRCTN的准确率达87%,识别性能最好。
To address the issues of incomplete information and susceptibility to noise in emotion recognition,a multi-modal emotion recognition network model(Bidirectional Encoder Representations from Transformers,Residual Neural Network and Connectionist Temporal Classification and Transformer,BRCTN)is constructed based on the Transformer network,integrating information from text,visual,and auditory modalities.The model incorporates character feature information to assist emotion recognition,enhancing the model′s ability to extract key features.The output vectors from single-modal emotion recognition are restructured into a unified format through modality alignment.The three modalities and character features are mapped into a high-dimensional global vector space to learn the potential relationships between different modal features.The model was validated on the IEMOCAP dataset,and results showed that,compared to other methods,BRCTN achieved an accuracy of 87%,demonstrating the best recognition performance.
作者
谢星宇
丁彩琴
王宪伦
潘东杰
XIE Xingyu;DING Caiqin;WANG Xianlun;PAN Dongjie(College of Mechanical and Electrical Engineering,Qingdao University of Science and Technology,Qingdao 266061,China;Qingdao Anjie Medical Technology Co.,Ltd,Qingdao 266100,China)
出处
《青岛大学学报(工程技术版)》
CAS
2024年第3期20-30,共11页
Journal of Qingdao University(Engineering & Technology Edition)
基金
山东省自然科学基金资助项目(ZR2020MF023)。