摘要
图像标题生成利用机器自动产生描述图像的句子,属于计算机视觉与自然语言处理的交叉领域。传统基于注意力机制的算法侧重特征图不同区域,忽略特征图通道,易造成注意偏差。该模型通过当前嵌入单词与隐藏层状态的耦合度来赋予特征图不同通道相应权重,并将其与传统方法结合为融合注意力机制,准确定位注意位置。实验结果均在指定的评估方法上有一定的提升,表明该模型可以生成更加流利准确的自然语句。
Image caption generation makes machine to automatically describe the content of an image,which belongs to a cros-sing domain of computer vision and natural language processing.Traditional algorithms based on attention mechanism focus on the different sub-regions of the feature maps,without considering the different channels of feature maps,which is easy to cause attention deviation.To solve this problem,the proposed model assigned corresponding weights to different channel feature maps by the degree of coupling between the currently embedded word and the state of the hidden layer,and combined it with the traditional method as a fusion attention mechanism to accurately locate the attention position.The experimental results have a certain improvement on the specified evaluation method,indicating that the model can generate more fluent and accurate natural sentences.
作者
侯一雯
田玉玲
Hou Yiwen;Tian Yuling(Dept.of Information&Computer,Taiyuan University of Technology,Taiyuan 030000,China)
出处
《计算机应用研究》
CSCD
北大核心
2021年第7期2209-2212,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61472271)。
关键词
图像标题生成
注意偏差
通道
耦合度
融合注意力
image caption generation
attention deviation
channel
coupling
fusion attention