Improved Blending Attention Mechanism in Visual Question Answering

下载PDF

导出

摘要 Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.

作者 Siyu Lu Yueming Ding Zhengtong Yin Mingzhe Liu Xuan Liu Wenfeng Zheng Lirong Yin

机构地区 School of Automation College of Resource and Environment Engineering School of Data Science and Artificial Intelligence School of Public Affairs and Administration Department of Geography and Anthropology

出处《Computer Systems Science & Engineering》 SCIE EI 2023年第10期1149-1161,共13页 计算机系统科学与工程（英文）

基金 This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).

关键词 Visual question answering spatial attention mechanism channel attention mechanism image feature processing text feature extraction

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

1Xueming Qiao,Weiyi Zhu,Dongjie Zhu,Liang Kong,Yingxue Xia Chunxu Lin,Zhenhao Guo,Yiheng Sun.A Method of Text Extremum Region Extraction Based on Joint-Channels[J].Journal on Artificial Intelligence,2020,2(1):29-37. 被引量：1
2Taohong Zhang,Cunfang Li,Nuan Cao,Rui Ma,ShaoHua Zhang,Nan Ma.Text Feature Extraction and Classification Based on Convolutional Neural Network(CNN)[J].国际计算机前沿大会会议论文集,2017(1):119-121.
3Jingjun Zhou,Jing Liu,Jingbing Li,Mengxing Huang,Jieren Cheng,Yen-Wei Chen,Yingying Xu,Saqib Ali Nawaz.Mixed Attention Densely Residual Network for Single Image Super-Resolution[J].Computer Systems Science & Engineering,2021,39(10):133-146.
4王虞,孙海春.视觉问答技术研究综述[J].计算机科学与探索,2023,17(7):1487-1505.
5Feng Yan,Wushouer Silamu,Yanbing Li.MVCE-Net: Multi-View Region Feature and Caption Enhancement Co-Attention Network for Visual Question Answering[J].Computers, Materials & Continua,2023(7):65-80.
6Byeongmin Choi,YongHyun Lee,Yeunwoong Kyung,Eunchan Kim.ALBERT with Knowledge Graph Encoder Utilizing Semantic Similarity for Commonsense Question Answering[J].Intelligent Automation & Soft Computing,2023(4):71-82. 被引量：1
7Zhongtian Guan,Meng Lin,Qiong Wu,Jinglong Wu,Kewei Chen,Hongbin Han,Dehua Chui,Xu Zhang,Chunlin Li.Neural mechanisms of top-down divided and selective spatial attention in visual and auditory perception[J].Brain Science Advances,2023,9(2):95-113.
8张舜尧,李华旺,张永合,王新宇,丁国鹏.基于独立注意力机制的图像检索算法[J].计算机科学,2023,50(S01):318-323. 被引量：1
9贾少杰,王雷.基于外部知识的视觉问答研究[J].电脑知识与技术,2023,19(13):15-18.
10ZHANG Qin,WANG Xing-yue,ZHANG Zheng-zhong,ZHOU Fu-na,HU Xiong.Wave Heave Compensation Based on An Optimized Backstepping Control Method[J].China Ocean Engineering,2022,36(6):959-968.

Computer Systems Science & Engineering

2023年第10期

浏览历史

内容加载中请稍等...

Improved Blending Attention Mechanism in Visual Question Answering

相关作者

相关机构

相关主题

浏览历史