摘要
针对现有对齐多模态语言序列情感分析方法常用的单词对齐方法缺乏可解释性的问题,提出了一种用于未对齐多模态语言序列情感分析的多交互感知网络(MultiDAN)。MultiDAN的核心是多层的、多角度的交互信息提取。首先使用循环神经网络(RNN)和注意力机制捕捉模态内的交互信息;然后,使用图注意力网络(GAT)一次性提取模态内及模态间的、长短期的交互信息;最后,使用特殊的图读出方法,再次提取图中节点的模态内及模态间交互信息,得到多模态语言序列的唯一表征,并应用多层感知机(MLP)分类获得序列的情感分数。在两个常用公开数据集CMU-MOSI和CMU-MOSEI上的实验结果表明,MultiDAN能充分提取交互信息,在未对齐的两个数据集上MultiDAN的F1值比对比方法中最优的模态时空注意图(MTAG)分别提高了0.49个和0.72个百分点,具有较高的稳定性。MultiDAN可以提高多模态语言序列的情感分析性能,且图神经网络(GNN)能有效提取模态内、模态间的交互信息。
Considering the issue that the word alignment methods commonly used in the existing methods for aligned multimodal language sequence sentiment analysis lack interpretability,a Multi-Dynamic Aware Network(MultiDAN)for unaligned multimodal language sequence sentiment analysis was proposed.The core of MultiDAN was multi-layer and multiangle extraction of dynamics.Firstly,Recurrent Neural Network(RNN)and attention mechanism were used to capture the dynamics within the modalities;secondly,intra-and inter-modal,long-and short-term dynamics were extracted at once using Graph Attention neTwork(GAT);finally,the intra-and inter-modal dynamics of the nodes in the graph were extracted again using a special graph readout method to obtain a unique representation of the multimodal language sequence,and the sentiment score of the sequence was obtained by applying a MultiLayer Perceptron(MLP)classification.The experimental results on two commonly used publicly available datasets,CMU-MOSI and CMU-MOSEI,show that MultiDAN can fully extract the dynamics,and the F1 values of MultiDAN on the two unaligned datasets improve by 0.49 and 0.72 percentage points respectively,compared to the optimal Modal-Temporal Attention Graph(MTAG)in the comparison methods,which have high stability.MultiDAN can improve the performance of sentiment analysis for multimodal language sequences,and the Graph Neural Network(GNN)can effectively extract intra-and inter-modal dynamics.
作者
罗俊豪
朱焱
LUO Junhao;ZHU Yan(School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China;Leeds Joint School,Southwest Jiaotong University,Chengdu Sichuan 611756,China)
出处
《计算机应用》
CSCD
北大核心
2024年第1期79-85,共7页
journal of Computer Applications
基金
四川省科技计划项目(2019YFSY0032)。
关键词
情感分析
多模态语言序列
多模态融合
图神经网络
注意力机制
sentiment analysis
multimodal language sequence
multimodal fusion
graph neural network
attention mechanism