摘要
遥感图像描述生成是同时涉及计算机视觉和自然语言处理领域的热门研究话题,其主要工作是对于给定的图像自动地生成一个对该图像的描述语句。文中提出了一种基于多尺度与注意力特征增强的遥感图像描述生成方法,该方法通过软注意力机制实现生成单词与图像特征之间的对齐关系。此外,针对遥感图像分辨率较高、目标尺度变化较大的特点,还提出了一种基于金字塔池化和通道注意力机制的特征提取网络(Pyramid Pool and Channel Attention Network,PCAN),用于捕获遥感图像多尺度以及局部跨通道交互信息。将该模型提取到的图像特征作为描述生成阶段软注意力机制的输入,通过计算得到上下文信息,然后将该上下文信息输入至LSTM网络中,得到最终的输出序列。在RSICD与MSCOCO数据集上对PCAN及软注意力机制进行有效性实验,结果表明,PCAN及软注意力机制的加入能够提升生成语句的质量,实现单词与图像特征之间的对齐。通过对软注意力机制的可视化分析,提高了模型结果的可信度。此外,在语义分割数据集上进行实验,结果表明所提PCAN对于语义分割任务同样具有有效性。
Remote sensing image description generation is a hot research topic involving both computer vision and natural language processing.Its main work is to automatically generate a description sentence for a given image.This paper proposes a remote sensing image description generation method based on multi-scale and attention feature enhancement.The alignment relationship between generated words and image features is realized through soft attention mechanism,which improves the pre-interpretability of the model.In addition,in view of the high resolution of remote sensing images and large changes in target scale,this paper proposes a feature extraction network(Pyramid Pool and Channel Attention Network,PCAN)based on pyramid pooling and channel attention mechanism to capture of multi-scale remote sensing image and local cross-channel mutual information.Image features extracted by the model are used as the input to describe the soft attention mechanism of the generation stage,thereby calculating the context information,and then inputting the context information into the LSTM network to obtain the final output sequence.Effectiveness experiments of PCAN and soft attention mechanism on RSICD and MSCOCO datasets prove that the joi-ning of PCAN and soft attention mechanism can improve the quality of generated sentences and realize the alignment between words and image features.Through the visualization analysis of the soft attention mechanism,the credibility of the model results is improved.In addition,experiments on the semantic segmentation dataset prove that the proposed PCAN is also effective for semantic segmentation tasks.
作者
赵佳琦
王瀚正
周勇
张迪
周子渊
ZHAO Jia-qi;WANG Han-zheng;ZHOU Yong;ZHANG Di;ZHOU Zi-yuan(School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China;Engineering Research Center of Mine Digitization,Ministry of Education of People’s Republic of China,Xuzhou,Jiangsu 221116,China;Innovation Research Center of Disaster Intelligent Prevention and Emergency Rescue,Xuzhou,Jiangsu 221116,China)
出处
《计算机科学》
CSCD
北大核心
2021年第1期190-196,共7页
Computer Science
基金
国家自然科学基金(61806206)
江苏省自然科学基金(BK20180639)
电子元器件可靠性物理及其应用技术重点实验室开放基金(614280620190403-1)。
关键词
注意力机制
特征增强
长短期记忆网络
遥感图像描述生成
Attention mechanism
Feature enhancement
Long short-term memory
Remote sensing image description generation