期刊文献+

基于局部选择Vision Transformer的遥感场景分类算法

Remote Sensing Scene Classification Based on Local Selection Vision Transformer
原文传递
导出
摘要 遥感场景分类旨在为航空图像指定特定的语义标签,是遥感图像解译中一个基础且重要的任务。现有的研究主要利用卷积神经网络(CNN)学习全局和局部特征,提高网络的判别性表达。然而基于CNN的方法的感受野在建模局部特征的远程依赖性方面存在局限性。近年来,Vision Transformer(ViT)在传统的分类任务中表现出了强大的性能。Transformer的自我注意力机制将每个Patch标记与分类标记连接起来,捕捉图像像素之间的上下文关系,考虑空间域中的全局信息。提出一个基于局部选择ViT的遥感场景分类网络。首先将输入图像分割成小块的Patch,将其展开转换成序列,并进行位置编码添加到序列中;然后将得到的序列输入编码器中;除此之外,为了学习到局部判别特征,在最后一层输入前加入局部选择模块,选择具有判别性的Token作为输入,得到最后用于分类的输出。实验结果表明,所提方法在两个大型遥感场景分类数据集(AID和NWPU)取得不错的效果。 Remote sensing scene classification aims to assign specific semantic labels to aerial images,which is a fundamental and important task in remote sensing image interpretation.Existing studies have used convolutional neural networks(CNN)to learn global and local features and improve the discriminative representation of networks.However,the perceptual wilderness of CNN-based approaches has limitations in modeling the remote dependence of local features.In recent years,Vision Transformer(ViT)has shown powerful performances in traditional classification tasks.Its selfattention mechanism connects each Patch with a classification token and captures the contextual relationship between image pixels by considering global information in the spatial domain.In this paper,we propose a remote sensing scene classification network based on local selection ViT,in which an input image is first segmented into small chunks of Patch that are unfolded and converted into sequences with position encoding;thereafter,the obtained sequences are fed into an encoder.In addition,a local selection module is added before the last layer of input in order to learn the local discriminative features,and Token with discriminative properties are selected as input to obtain the final classification output.The experimental results show that the proposed method achieves good results on two large remote sensing scene classification datasets(AID and NWPU).
作者 杨凯 卢孝强 Yang Kai;Lu Xiaoqiang(Key Laboratory of Spectral Imaging Technology,Xi’an Institute of Optics and Precision Mechanics,Chinese Academy of Sciences,Xi’an 710119,Shaanxi,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《激光与光电子学进展》 CSCD 北大核心 2023年第22期319-325,共7页 Laser & Optoelectronics Progress
基金 国家杰出青年科学基金(61925112)。
关键词 遥感场景分类 深度学习 Vision Transformer 局部特征 remote sensing scene classification deep learning Vision Transformer local feature
  • 相关文献

参考文献2

共引文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部