摘要
先验知识指导模型训练广泛使用于目标检测和图像检索等计算机视觉领域中,运用先验框、标签、分类信息作为先验知识可以提高模型的精度和效率。在图像描述领域中通常采用图像特征或历史语义信息作为先验知识,但忽略了图像本身的先验信息。为了在图像描述方法中获取图像的先验信息,笔者提出一种基于先验词汇机制的图像描述生成方法(priori vocabulary mechanisms,PVM),采用Faster R-CNN提取图像特征;提出一种融合多示例学习的先验词汇生成方法是提取图像中的先验词汇,设计先验特征提取模块,从先验词汇和图像特征提取先验特征;最后将先验特征输入到改进的Transformer生成描述语句,从而指导模型融合图像的先验信息。使用MSCOCO数据集对实验进行评估,在BLEU_4和CIDEr上分别为38.7%和128.5%,相较于基准模型分别提升了1.7%和6.7%,这表明该模型生成的描述文本更加准确丰富,证明方法有效。
In computer vision fields such as object detection and image retrieval,prior knowledge including predefined frames,labels,and category information is utilized to guide model training,enhancing precision and efficiency.The image captioning domain typically uses image features or historical semantic information as prior knowledge,yet often overlooks the priori information of images.To capture prior information of the image in image captioning methods,a new image caption generation technique based on a priori vocabulary mechanisms(PVM)is proposed.This method utilizes Faster R-CNN for extracting image features and incorporates a priori vocabulary generation method that employs multi-instance learning to extract prior information from images.Additionally,a priori feature extraction module is designed to derive prior features from both the prior vocabulary and the image features.Lastly,these priori features are fed into an enhanced Transformer to produce descriptive sentences,thereby guiding the model to integrate lexical priori information of the image.The proposed method is experimentally evaluated on the MSCOCO dataset,achieving scores of 38.7%on BLEU_4 and 128.5%on CIDEr.These results mark an improvement of 1.7%and 6.7%respectively when compared to baseline models.Such findings indicate that the description text generated by the model is more accurate and comprehensive,which proves the effectiveness of the proposed method.
作者
吴京
李广明
张红良
申京傲
李杰
WU Jing;LI Guangming;ZHANG Hongliang;SHEN Jing’ao;LI Jie(School of Computer Science and Technology,Dongguan University of Technology,Dongguan 523808,China)
出处
《东莞理工学院学报》
2024年第5期18-25,共8页
Journal of Dongguan University of Technology
基金
国家自然科学基金青年科学基金资助项目(62106046)
广东大学生科技创新培育专项资金项目(Pdjh2002a0505)。