基于特征融合的多波段图像描述生成方法

Multi-Band Image Caption Generation Method Based on Feature Fusion

下载PDF

导出

摘要针对现有图像描述生成方法普遍存在的对夜间场景、目标被遮挡情景和拍摄模糊图像描述效果不佳的问题,提出一种基于特征融合的多波段探测图像描述生成方法。将红外探测成像引入图像描述领域,首先利用多层卷积神经网络(CNN)对可见光图像和红外图像分别提取特征;然后根据不同探测波段的互补性,以多头注意力机制为主体设计空间注意力模块,以融合目标波段特征;接着应用通道注意力机制聚合空间域信息,指导生成不同类型的单词;最后在传统加性注意力机制的基础上构建注意力增强模块,计算注意力结果图与查询向量的相关权重系数,消除无关变量的干扰,从而实现图像描述生成。在可见光图像-红外图像描述数据集上进行多组实验,结果表明,该方法能有效融合双波段的语义特征,BLEU4指标、CIDEr指标分别达到58.3%和136.1%,能显著提高图像描述准确度,可以用于安防监控、军事侦察等复杂场景任务。 This study proposes a multi-band detection image caption generation method based on feature fusion to address the common problem of poor performance in describing nighttime scenes,occluded target scenes,and captured blurred images in existing image caption generation methods.Incorporating infrared detection imaging into image captioning involves a sequential process.Initially,multi-layer Convolutional Neural Networks(CNN)are employed to independently extract features from both visible light and infrared images.Subsequently,to harness the complementary nature of these different detection bands,a spatial attention module,primarily structured around a multi-head attention mechanism,is developed to integrate the features from each specific band.Finally,a channel attention mechanism is used to consolidate information across the spatial domain,thereby facilitating the generation of diverse word types tailored to the captured images.Based on the traditional additive attention mechanism,an attention enhancement module is constructed to calculate the correlation weight coefficients between the attention result graph and the query vector,eliminate the interference of irrelevant variables,and thus achieve image caption generation.Multiple experiments on the visible image-infrared image caption dataset demonstrate that the method can effectively fuse semantic features of dual bands.The application of the Bilingual Evaluation Understudy4(BLEU4)and Consensus-based Image Description Evaluation(CIDEr)indices demonstrate substantial improvements in image caption accuracy reaching scores of 58.3%and 136.1%,respectively.These enhancements significantly bolster the utility of this technology for complex scene analysis tasks such as security monitoring and military reconnaissance.

作者贺姗蔺素珍王彦博李大威 HE Shan;LIN Suzhen;WANG Yanbo;LI Dawei(College of Computer Science and Technology,North University of China,Taiyuan 030051,Shanxi,China;College of Control Engineering,North University of China,Taiyuan 030051,Shanxi,China)

机构地区中北大学计算机科学与技术学院中北大学控制工程学院

出处《计算机工程》 CAS CSCD 北大核心 2024年第6期236-244,共9页 Computer Engineering

基金山西省研究生创新项目(2022Y630)。

关键词图像描述图像融合多波段图像自注意力机制组合注意力 image caption image fusion multi-band image self-attention mechanism combined attention

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1苗益,赵增顺,杨雨露,徐宁,杨皓然,孙骞.图像描述技术综述[J].计算机科学,2020,47(12):149-160. 被引量：8
2郭玥秀,杨伟,刘琦,王玉.残差网络研究综述[J].计算机应用研究,2020,37(5):1292-1297. 被引量：66
3季长清,高志勇,秦静,汪祖民.基于卷积神经网络的图像分类算法综述[J].计算机应用,2022,42(4):1044-1049. 被引量：65
4Yue Ming,Nannan Hu,Chunxiao Fan,Fan Feng,Jiangwan Zhou,Hui Yu.Visuals to Text:A Comprehensive Review on Automatic Image Captioning[J].IEEE/CAA Journal of Automatica Sinica,2022,9(8):1339-1365. 被引量：4

二级参考文献25

1任越美,程显毅,李小燕,谢玉宇.基于概念级语义的图像描述与识别[J].计算机科学,2008,35(7):206-212. 被引量：2
2张帆,张良,刘星,张宇.基于深度残差网络的脱机手写汉字识别研究[J].计算机测量与控制,2017,25(12):259-262. 被引量：9
3赵增顺,高寒旭,孙骞,滕升华,常发亮,Dapeng Oliver Wu.生成对抗网络理论框架、衍生模型与应用最新进展[J].小型微型计算机系统,2018,39(12):2602-2606. 被引量：19
4裴颂文,杨保国,顾春华.网中网残差网络模型的表情图像识别研究[J].小型微型计算机系统,2018,39(12):2681-2686. 被引量：11
5谢志华,江鹏,余新河,张帅.基于VGGNet和多谱带循环网络的高光谱人脸识别系统[J].计算机应用,2019,39(2):388-391. 被引量：12
6Shuping Liu,Yantuan Xian,Huafeng Li,Zhengtao Yu.Text Detection in Natural Scene Images Using Morphological Component Analysis and Laplacian Dictionary[J].IEEE/CAA Journal of Automatica Sinica,2020,7(1):214-222. 被引量：7
7Teng Liu,Bin Tian,Yunfeng Ai,Fei-Yue Wang.Parallel Reinforcement Learning-Based Energy Efficiency Improvement for a Cyber-Physical System[J].IEEE/CAA Journal of Automatica Sinica,2020,7(2):617-626. 被引量：17
8田锦,袁家政,刘宏哲.基于实例分割的车道线检测及自适应拟合算法[J].计算机应用,2020,40(7):1932-1937. 被引量：11
9Xiaodong Zhao,Yaran Chen,Jin Guo,Dongbin Zhao.A Spatial-Temporal Attention Model for Human Trajectory Prediction[J].IEEE/CAA Journal of Automatica Sinica,2020,7(4):965-974. 被引量：4
10Xuesong Li,Yating Liu,Kunfeng Wang,Fei-Yue Wang.A Recurrent Attention and Interaction Model for Pedestrian Trajectory Prediction[J].IEEE/CAA Journal of Automatica Sinica,2020,7(5):1361-1370. 被引量：6

共引文献139

1池亚平,岳梓岩,赵伦.密码算法识别技术研究进展与展望[J].北京电子科技学院学报,2022,30(4):1-14.
2孟涛,王晓勇,胡胜利.结合CNN和GCN的在线学习平台辍学预测方法[J].哈尔滨师范大学自然科学学报,2023,39(4):58-64.
3邓宇平,王桂棠.基于GoogleNet网络与残差网络的织物纹理分析[J].电子测量技术,2021,44(7):31-38. 被引量：4
4刘奇,李杨.基于残差神经网络的车规级驾驶行为在线分析系统[J].无线互联科技,2019,16(13):39-41. 被引量：1
5张泽平,杨浪,谢志行.基于Xception的实时情绪识别在课堂质量分析上的研究[J].自动化技术与应用,2020,39(6):48-53. 被引量：3
6张晓艳,张宝华,吕晓琪,谷宇,王月明,刘新,任彦,李建军.深度双重注意力的生成与判别联合学习的行人重识别[J].光电工程,2021,48(5):54-62. 被引量：5
7杨昊,胡曼,徐永利.基于深度学习的多模态眼科图像回归预测[J].北京化工大学学报（自然科学版）,2021,48(3):81-87. 被引量：3
8刘勇,梁宏涛,刘国柱,胡强.基于ResNet-LSTM的声纹识别方法[J].计算机系统应用,2021,30(6):215-219. 被引量：3
9薛超培,唐春晖.基于改进Faster RCNN的台风云系识别[J].软件导刊,2021,20(6):63-67.
10陈婉琴,唐清善.基于ResNeXt-SSD的多目标缺陷检测算法[J].信息技术与信息化,2021(6):72-74. 被引量：2

1周松青,曹宗芳.来曲唑联合二甲双胍治疗多囊卵巢综合征不孕的临床效果[J].中国妇幼保健,2024,39(2):285-288. 被引量：1
2张浩萌,刘斌.融合语义信息和视觉推理特征的视频描述方法[J].小型微型计算机系统,2024,45(2):470-476.
3彭梦昊,王冠,徐浩,景圣恩.基于时空Transformer的端到端的视频注视目标检测[J].图像与信号处理,2024,13(2):190-209.
4刘忠华,卢鑫,梅文强,赵旻,胡彬彬,张轲,殷红慧.基于LightGBM模型的中国成人吸烟行为研究[J].现代信息科技,2024,8(7):128-135.
5巩文平.糖尿病病人应用凝血四项并血脂检验的临床效果分析[J].中文科技期刊数据库（全文版）医药卫生,2024(4):0054-0057.
6卜亚平,戴晓婧,张悦,苏玲,王琦.光谱数据融合技术在食用菌质量评价中的应用[J].菌物研究,2024,22(2):196-202.
7李恋龙,潘洁,杨润,李妍琪,陈洁.科学思维视域下探究“温度对酶活性的影响”[J].生物学通报,2024,59(2):45-49.
8杨琼.声带息肉手术联合抗酸治疗对反流性咽喉疾病合并声带息肉的预后影响[J].每周文摘·养老周刊,2023(23):61-63.
9李星军,邵志伟,梁嘉怡.基于Roberts算子的激光夜视图像自动分割方法[J].激光杂志,2024,45(5):110-114.
10康哲民,雷能忠,祖庆芝.一种改进的残余力向量法在结构损伤识别中的应用[J].计算力学学报,2024,41(2):263-269.

计算机工程

2024年第6期

浏览历史

内容加载中请稍等...

基于特征融合的多波段图像描述生成方法

参考文献4

二级参考文献25

共引文献139

相关作者

相关机构

相关主题

浏览历史