摘要
针对天气、季节、光线等环境变化导致的视觉地点识别鲁棒性低的问题,提出了一种提升视觉地点识别特征描述子环境稳健性的多维度注意力机制——平行全维动态注意力机制(POD-Attention)。为实现卷积核在全维度上的动态精细探索,增强特征提取网络对建筑物等不变性特征的提取与学习能力,采用全维动态卷积块在卷积核全维度(输入输出通道、卷积空间和卷积核数量)上添加互补性注意力。将1×1卷积、Skip Squeeze-and-Excitation(SSE)模块与全维动态卷积块平行融合,不仅有效提高了特征提取速率,还扩大了视觉地点识别网络的感受野,进一步提升了视觉地点的识别准确率。在公开数据集上进行的实验表明,基于VGG16及Patch-NetVLAD特征聚合的视觉地点识别方法经POD注意力机制改进后,在Nordland与Mapillary Street-Level Sequences数据集上的Recall@1指标提升了9.7%与1.8%,充分证明了本文POD注意力机制对于网络性能的提升效果,也证明了基于本文POD注意力机制的视觉地点识别方法的有效性。
To address the issue of low robustness in visual place recognition due to environmental changes like weather,season and lighting,we propose a solution called parallel omnidimensional dynamic attention(PODAttention).In order to achieve dynamic and fine-grained exploration of convolutional kernels across all dimensions and enhance the feature extraction network’s ability to capture invariant features like buildings,a complementary attention mechanism is incorporated into the omni-dimensional dynamic convolutional block.This mechanism operates on all dimensions of the convolutional kernels,including input/output channels,convolutional space and kernel quantity,enabling comprehensive attention across the entire kernel space.Furthermore,the parallel fusion of the 1×1 convolution,skip squeeze-and-excitation(SSE)module and omni-dimensional dynamic convolutional block yields notable benefits in terms of both feature extraction speed and the expansion of the receptive field within the visual place recognition network.By combining these components in parallel,the network gains the ability to capture more comprehensive information,resulting in enhanced accuracy for visual place recognition tasks.Experiments conducted on public datasets show that the visual place recognition method based on VGG16 and Patch-NetVLAD feature aggregation improved by the POD attention mechanism,achieves 9.7%increase in Recall@1 on the Nordland dataset and 1.8%increase on the Mapillary Street-Level Sequences dataset.These results demonstrate that the proposed POD attention mechanism effectively enhances the robustness of visual place recognition in different environmental conditions,laying a foundation for more accurate visual localization and map construction in visual SLAM.
作者
刘沛津
刘淑婕
何林
彭莉峻
付雪峰
LIU Peijin;LIU Shujie;HE Lin;PENG Lijun;FU Xuefeng(School of Mechanical and Electrical Engineering,Xi'an University of Architecture&Technology,Xi'an 710055,China;Faculty of Science,Xi'an University of Architecture&Technology,Xi'an 710055,China)
出处
《液晶与显示》
CAS
CSCD
北大核心
2024年第9期1233-1242,共10页
Chinese Journal of Liquid Crystals and Displays
基金
陕西省重点研发计划(No.2022GY-134)
陕西省教育厅专项科研项目(No.21JK0732)
西安建筑科技大学自然科学专项(No.ZR19058)。
关键词
视觉地点识别
环境鲁棒性
深度学习
平行全维动态注意力机制
平行策略
visual place recognition
environmental robustness
deep learning
parallel omni-dimensional dynamic attention
parallel strategy