摘要
针对交通场景复杂多变,主要体现在道路拓扑结构复杂、道路元素和交通参与者类型的多样性问题,提出一种基于注意力机制的图像描述生成算法。在算法的编码阶段,利用卷积神经网络提取图像不同区域的图像特征,每个区域融合注意力机制用来获取具有注意力权值的图像特征,突出图像中的重点信息。解码阶段,利用多个长短期记忆网络模块作为交通场景图像描述生成任务的语言模型。实验结果表明:在MSCOCO验证数据集中,该算法在评估指标BLEU-1至BLEU-4上分值分别为0.735、0.652、0.368和0.323,所提算法能够很好地描述交通场景图像。
The traffic scene is complex and changeable,mainly reflected in the complexity of the road topology,the diversity of road elements and types of traffic participants.Focused on the above problem,this paper proposes an image caption algorithm based on attention mechanism.In the encoder phase of the algorithm,the convolutional neural network was used to extract the image features in different areas,and then the fusion attention mechanism was used to obtain the image features with attention weight and highlight the key information in the image.In the decoder stage,the multiple long short term memory network was used as a language model for generating a task of the traffic scene image caption.The experimental results show that on the MSCOCO dataset,the scores of BLEU-1,2,3,4 are 0.735,0.652,0.368 and 0.323.The proposed algorithm can describe the traffic scene image well.
作者
宋禄琴
玄祖兴
王彩云
Song Luqin;Xuan Zuxing;Wang Caiyun(Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China;Institute of Fundamental and Interdisciplinary Sciences,Beijing Union University,Beijing 100101,China)
出处
《计算机应用与软件》
北大核心
2022年第11期201-207,共7页
Computer Applications and Software
基金
北京市属高校高水平教师队伍建设支持计划项目(IDHT20170511)
北京联合大学人才强校优选计划项目(BPHR2020EZ01)
北京联合大学研究生项目。