摘要
针对现有的情感分析方法缺乏对短视频中信息的充分考虑,从而导致不恰当的情感分析结果。基于音视频的多模态情感分析(AV-MSA)模型便由此产生,模型通过利用视频帧图像中的视觉特征和音频信息来完成短视频的情感分析。模型分为视觉与音频2分支,音频分支采用卷积神经网络(CNN)架构来提取音频图谱中的情感特征,实现情感分析的目的;视觉分支则采用3D卷积操作来增加视觉特征的时间相关性。并在Resnet的基础上,突出情感相关特征,添加了注意力机制,以提高模型对信息特征的敏感性。最后,设计了一种交叉投票机制用于融合视觉分支和音频分支的结果,产生情感分析的最终结果。AV-MSA模型在IEMOCAP和微博视听(WB-AV)数据集上进行了评估,实验结果表明,与现有算法相比,AV-MSA在分类精确度上有了较大的提升。
The existing sentiment analysis methods lack sufficient consideration of information in short videos,leading to inappropriate sentiment analysis results.Based on this,we proposed the audio-visual multimodal sentiment analysis(AV-MSA)model that can complete the sentiment analysis of short videos using visual features in frame images and audio information in videos.The model was divided into two branches,namely the visual branch and the audio branch.In the audio branch,the convolutional neural networks(CNN)architecture was employed to extract the emotional features in the audio atlas to achieve the purpose of sentiment analysis;in the visual branch,we utilized the 3D convolution operation to increase the temporal correlation of visual features.In addition,on the basis of ResNet,in order to highlight the emotion-related features,we added an attention mechanism to enhance the sensitivity of the model to information features.Finally,a cross-voting mechanism was designed to fuse the results of the visual and audio branches to produce the final result of sentiment analysis.The proposed AV-MSA was evaluated on IEMOCAP and Weibo audio-visual(Weibo audio-visual,WB-AV)datasets.Experimental results show that compared with the current short video sentiment analysis methods,the proposed AV-MSA has improved the classification accuracy greatly.
作者
黄欢
孙力娟
曹莹
郭剑
任恒毅
HUANG Huan;SUN Li-juan;CAO Ying;GUO Jian;REN Heng-yi(College of Computer,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003,China;Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing University of Posts and Telecommunications,Nanjing Jiangsu 210003,China;College of Computer and Information Engineering,Henan University,Kaifeng Henan 475001,China)
出处
《图学学报》
CSCD
北大核心
2021年第1期8-14,共7页
Journal of Graphics
基金
国家自然科学基金项目(61873131,61702284)
安徽省科技厅面上项目(1908085MF207)
江苏省博士后研究基金项目(2018K009B)。
关键词
多模态情感分析
残差网络
3D卷积神经网络
注意力
决策融合
multimodal sentiment analysis
ResNet
3D convolutional neural networks
attention
decision fusion