基于多尺度跨模态特征融合的图文情感分类模型

Image-Text Sentiment Classification Model Based on Multi-scale Cross-modal Feature Fusion

下载PDF

导出

摘要图文情感分类任务常用早期融合和Transformer模型相结合的跨模态特征融合策略进行图文特征融合,但该策略更倾向于关注模态内部的独有信息,而忽略了模态间的相互联系和共有信息,导致跨模态特征融合效果不理想。针对此问题,提出一种基于多尺度跨模态特征融合的图文情感分类方法。局部尺度方面,基于跨模态注意力机制进行局部特征融合,使模型不仅关注图像和文本的独有信息,而且可以发现图像和文本之间的联系和共有信息。全局尺度方面,基于MLM损失进行全局特征融合,使模型对图像和文本数据进行全局建模,进一步挖掘图像和文本之间的联系,从而促进图像和文本特征的深度融合。在两个公开数据集MVSA-Single和MVSA-Multiple上与10个基线模型进行对比实验,结果表明所提方法在精度、F1值和模型参数量方面均具有明显优势,验证了其有效性。 For the image-text sentiment classification task,the cross-modal feature fusion strategy which combines early fusion and Transformer model is usually used for image-text feature fusion.However,this strategy prefers to focus on the unique information within a single modality,while ignoring the interconnections and common information among multiple modalities,resulting in unsatisfactory effect of cross-modal feature fusion.To solve this problem,a method of image-text classification based on multi-scale cross-modal feature fusion is proposed.On the one hand,for the local scale,local feature fusion is carried out based on the cross-modal attention mechanism,so that the model not only focuses onthe unique information of the image and text,but also explores the connection and common information between the image and text.On the other hand,for the global scale,global feature fusion based on MLM loss enables the model to conduct global modeling of image and text data,further mine the relationship between them,and thus promote the deep fusion of image and text features.Compared with ten baseline models on two public datasets,MVSA-Single and MVSA-Multiple,the proposed method shows distinct advantages in accuracy,F1 score,and model para-meter quantity,verifying its effectiveness.

作者刘倩白志豪程春玲归耀城 LIU Qian;BAI Zhihao;CHENG Chunling;GUI Yaocheng(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;School of Modern Posts,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)

机构地区南京邮电大学计算机学院、软件学院、网络空间安全学院南京邮电大学现代邮政学院

出处《计算机科学》 CSCD 北大核心 2024年第9期258-264,共7页 Computer Science

基金江苏省双创博士项目(JSSCBS20210507)。

关键词图文情感分类跨模态特征融合 Transformer模型注意力机制 MLM损失 Image-Text sentiment classification Cross-modal feature fusion Transformer model Attention mechanism MLM loss

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1Cheng Peng,Chunxia Zhang,Xiaojun Xue,Jiameng Gao,Hongjian Liang,Zhengdong Niu.Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification[J].Tsinghua Science and Technology,2022,27(4):664-679. 被引量：5
2郭艳霞,金勇,唐宏,彭金枝.基于动态卷积与残差门控的多模态情感识别[J].计算机工程,2023,49(7):94-101. 被引量：1

二级参考文献3

1Bo Liu,Shijiao Tang,Xiangguo Sun,Qiaoyun Chen,Jiuxin Cao,Junzhou Luo,Shanshan Zhao.Context-Aware Social Media User Sentiment Analysis[J].Tsinghua Science and Technology,2020,25(4):528-541. 被引量：7
2柳素红,孙晓,李春彬.基于位置信息重建与时频域信息融合的脑电信号情感识别[J].计算机工程,2021,47(12):95-102. 被引量：8
3乔栋,陈章进,邓良,屠程力.基于改进语音处理的卷积神经网络中文语音情感识别方法[J].计算机工程,2022,48(2):281-290. 被引量：15

共引文献4

1Weihua Liu,Haoyang Wan,Boyuan Yan.Short Video Recommendation Algorithm Incorporating Temporal Contextual Information and User Context[J].Computer Modeling in Engineering & Sciences,2023(4):239-258.
2屈立成,郤丽媛,刘紫君,魏思,董哲为.跨模态语义时空动态交互情感分析研究[J].计算机工程与应用,2024,60(1):165-173.
3Jingyun Zhang,Wenjie Zhu,Byoung Jin Ahn,Yongsheng Zhou.Composite Recommendation of Artworks in E-Commerce Based on User Keyword-Driven Correlation Graph Search[J].Tsinghua Science and Technology,2024,29(1):174-184.
4Jie Yuan,Fangru Lin,Hae Yoon Kim.Exploring Artistic Embeddings in Service Design:A Keyword-Driven Approach for Artwork Search and Recommendations[J].Tsinghua Science and Technology,2024,29(5):1580-1592.

1朱宁可,葛青,王翰文,余鹏飞.基于Yolov5-MGC的实时交通标志检测[J].激光与光电子学进展,2024,61(12):338-347. 被引量：1
2李欢欢,黄添强,丁雪梅,罗海峰,黄丽清.基于多尺度时空图卷积网络的交通出行需求预测[J].计算机应用,2024,44(7):2065-2072.
3帕丽旦·木合塔尔,郭文强,买买提阿依甫,吾守尔·斯拉木.基于微调BERT混合模型的情感分类方法[J].计算机仿真,2024,41(7):522-528.
4张华辉,冯林,荆沁璐.基于融合对抗网络的方面级情感分类方法[J].中文信息学报,2024,38(7):147-157.
5吕阳,刘聪聪,杨龙波,曹兴岚,王月影,童毅,Mohamed Hazman,钱前,商连光,郭龙彪.全基因组关联分析(GWAS)鉴定水稻氮素利用效率候选基因[J].中国水稻科学,2024,38(5):516-524.
6刘晓倩,崔焕勇,刘海宁,付宇,曾文胜,李发家.融合多特征选择和自注意力机制的LSTM燃料电池退化预测方法[J].电子测量与仪器学报,2024,38(5):219-228.
7范焱,刘乔,袁笛,刘云鹏.空域和频域特征解耦的红外与可见光图像融合[J].红外与激光工程,2024,53(8):212-227. 被引量：1
8周振方,董书宁,董阳,罗生虎,薛建坤,王治宙,王淑璇,尚宏波,王甜甜,王昱同,王同.蒙陕接壤区典型煤层开采顶板周期性变形破坏及涌水响应特征[J].煤田地质与勘探,2024,52(8):101-110.

计算机科学

2024年第9期

浏览历史

内容加载中请稍等...

基于多尺度跨模态特征融合的图文情感分类模型

参考文献2

二级参考文献3

共引文献4

相关作者

相关机构

相关主题

浏览历史