期刊文献+

基于特征融合的多波段图像描述生成方法

Multi-Band Image Caption Generation Method Based on Feature Fusion
下载PDF
导出
摘要 针对现有图像描述生成方法普遍存在的对夜间场景、目标被遮挡情景和拍摄模糊图像描述效果不佳的问题,提出一种基于特征融合的多波段探测图像描述生成方法。将红外探测成像引入图像描述领域,首先利用多层卷积神经网络(CNN)对可见光图像和红外图像分别提取特征;然后根据不同探测波段的互补性,以多头注意力机制为主体设计空间注意力模块,以融合目标波段特征;接着应用通道注意力机制聚合空间域信息,指导生成不同类型的单词;最后在传统加性注意力机制的基础上构建注意力增强模块,计算注意力结果图与查询向量的相关权重系数,消除无关变量的干扰,从而实现图像描述生成。在可见光图像-红外图像描述数据集上进行多组实验,结果表明,该方法能有效融合双波段的语义特征,BLEU4指标、CIDEr指标分别达到58.3%和136.1%,能显著提高图像描述准确度,可以用于安防监控、军事侦察等复杂场景任务。 This study proposes a multi-band detection image caption generation method based on feature fusion to address the common problem of poor performance in describing nighttime scenes,occluded target scenes,and captured blurred images in existing image caption generation methods.Incorporating infrared detection imaging into image captioning involves a sequential process.Initially,multi-layer Convolutional Neural Networks(CNN)are employed to independently extract features from both visible light and infrared images.Subsequently,to harness the complementary nature of these different detection bands,a spatial attention module,primarily structured around a multi-head attention mechanism,is developed to integrate the features from each specific band.Finally,a channel attention mechanism is used to consolidate information across the spatial domain,thereby facilitating the generation of diverse word types tailored to the captured images.Based on the traditional additive attention mechanism,an attention enhancement module is constructed to calculate the correlation weight coefficients between the attention result graph and the query vector,eliminate the interference of irrelevant variables,and thus achieve image caption generation.Multiple experiments on the visible image-infrared image caption dataset demonstrate that the method can effectively fuse semantic features of dual bands.The application of the Bilingual Evaluation Understudy4(BLEU4)and Consensus-based Image Description Evaluation(CIDEr)indices demonstrate substantial improvements in image caption accuracy reaching scores of 58.3%and 136.1%,respectively.These enhancements significantly bolster the utility of this technology for complex scene analysis tasks such as security monitoring and military reconnaissance.
作者 贺姗 蔺素珍 王彦博 李大威 HE Shan;LIN Suzhen;WANG Yanbo;LI Dawei(College of Computer Science and Technology,North University of China,Taiyuan 030051,Shanxi,China;College of Control Engineering,North University of China,Taiyuan 030051,Shanxi,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第6期236-244,共9页 Computer Engineering
基金 山西省研究生创新项目(2022Y630)。
关键词 图像描述 图像融合 多波段图像 自注意力机制 组合注意力 image caption image fusion multi-band image self-attention mechanism combined attention
  • 相关文献

参考文献4

二级参考文献25

共引文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部