期刊文献+

基于特征增强和语义相关性匹配的图像文本检索方法 被引量:2

Image text retrieval method based on feature enhancement and semantic correlation matching
下载PDF
导出
摘要 为实现图像文本检索中图像与文本的精确语义连接,提出一种基于特征增强和语义相关性匹配(FESCM)的图像文本检索方法。首先,通过特征增强表示模块,引入多头自注意力机制增强图像区域特征和文本单词特征,以减少冗余信息对图像区域和文本单词对齐的干扰;其次,通过语义相关性匹配模块,不仅利用局部匹配捕获局部显著对象之间的对应相关性,还把图像背景信息融入图像全局特征,利用全局匹配实现精确的全局语义相关性;最后,通过局部匹配分数和全局匹配分数获取图像和文本的最终匹配分数。实验结果表明,基于FESCM的图像文本检索方法在Flickr8k和Flickr30k基准数据集上的召回率总值比扩展的视觉语义嵌入方法分别提升了5.7和7.5个百分点,在MS-COCO数据集比双流层次相似度推理方法提升了3.7个百分点。因此该方法可以有效提高图像文本检索的准确度,实现图像与文本的语义连接。 In order to achieve the precise semantic correlation between image and text,an image text retrieval method based on Feature Enhancement and Semantic Correlation Matching(FESCM)was proposed.Firstly,through the feature enhancement representation module,the multi-head self-attention mechanism was introduced to enhance image region features and text word features to reduce the interference of redundant information to alignment of image region and text word.Secondly,the semantic correlation matching module was used to not only capture the corresponding correlation between locally significant objects by local matching,but also incorporate the image background information into the global image features and achieve accurate global semantic correlation by global matching.Finally,the local matching scores and global matching scores were used to obtain the final matching scores of images and texts.The experimental results show that the FESCM-based image text retrieval method improves the recall sum over the extended visual semantic embedding method by 5.7 and 7.5 percentage points on Flickr8k and Flickr30k benchmark datasets,respectively;the recall sum is improved by 3.7 percentage points over the Two-Stream Hierarchical Similarity Reasoning method on the MS-COCO dataset.The proposed method can effectively improve the accuracy of image text retrieval and realize the semantic connection between image and text.
作者 陈佳 张鸿 CHEN Jia;ZHANG Hong(School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan Hubei 430081,China;Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System(Wuhan University of Science and Technology),Wuhan Hubei 430081,China)
出处 《计算机应用》 CSCD 北大核心 2024年第1期16-23,共8页 journal of Computer Applications
基金 国家重点研发计划项目(2020AAA0108503)
关键词 图像文本检索 特征增强表示 多头自注意力机制 语义相关性匹配 image text retrieval feature enhancement representation multi-head self-attention mechanism semantic correlation matching
  • 相关文献

参考文献3

二级参考文献8

共引文献37

同被引文献24

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部