摘要
为了探究图像底层视觉特征与高层语义概念存在的差异,提出可以确定图像关注重点、挖掘更高层语义信息以及完善描述句子的细节信息的图像语义描述算法.在图像视觉特征提取时提取输入图像的全局-局部特征作为视觉信息输入,确定不同时刻对图像的关注点,对图像细节的描述更加完善;在解码时加入注意力机制对图像特征加权输入,可以自适应选择当前时刻输出的文本单词对视觉信息与语义信息的依赖权重,有效地提高对图像语义描述的性能.实验结果表明,该方法相对于其他语义描述算法效果更有竞争力,可以更准确、更细致地识别图片中的物体,对输入图像进行更全面地描述;对于微小的物体的识别准确率更高.
The image captioning algorithm was proposed in order to explore the difference of the image visual features and the upper layer semantic concept.The algorithm can determine the image focus,mine higher-level semantic information,and improve the description details.Local features were added for the image visual feature extraction,and the global-local feature of the input image was combined with the global features and local features for visual information.Then the focus of the image at different time was determined,and more details of the image were caught.The attention mechanism was added to weight the image feature during decoding,so that the dependence of the text words on the visual information and the semantic information at the current moment could be adaptively adjusted,and the performance of image captioning was effectively improved.The experimental results show that the proposed method can acquire competitive captioning results than other image captioning algorithms.The method can describe the image more accurately and more comprehensively,and the recognition accuracy of tiny objects is higher than others.
作者
赵小虎
尹良飞
赵成龙
ZHAO Xiao-hu;YIN Liang-fei;ZHAO Cheng-long(National and Local Joint Engineering Laboratory of Internet Application Technology on Mine,China University of Mining and Technology,Xuzhou 221008,China;School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China)
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2020年第1期126-134,共9页
Journal of Zhejiang University:Engineering Science
基金
国家重点研发计划资助项目(2017YFC0804400)