摘要
伴随着智能终端设备的爆炸性增长和移动互联网的迅速崛起,在许多场景下,例如地广人稀的偏远山区,基于位置的服务需求越来越凸显。但由于这些区域GPS信号遮挡或信号基站难以覆盖,GPS定位无法正常发挥作用。图像地理定位指仅根据视觉信息确定图像的拍摄位置。在没有任何先验知识的情况下,预测照片的地理位置是一项非常艰巨的任务,因为不同条件下(例如,不同的天气,物体或相机设置)拍摄的图像会呈现出巨大的变化。文中尝试探索图像的跨视角地理视觉定位方法,首先利用逆极坐标转换将街景视角转换为空域视角图像,以此减少两者间的域差异,再利用深度学习的方法来对不同视角的图像进行编码以获得更加鲁棒的图像全局向量描述子,然后在此基础之上进行图像匹配和街景视角查询图像的定位。在图像特征提取方面,采用了VGG16模型,利用层数更深的小卷积核的方式去增大网络模型的感受视野并节省参数。在特征编码方面,将多尺度注意力机制融入NetVLAD模型,将骨架模型提取到的特征编码成更加鲁棒的全局特征描述子向量。实验结果显示,上述方法能够实现较高精度的街景视角的匹配与定位,同目前已有的方法相比,匹配精度更高。而且无须专业设备采集的高清街景视图,普通智能手机拍摄的街景视图即可获得较好的匹配定位精度。
With the explosive growth of smart terminal equipment and the rapid rise of mobile Internet,in many scenarios,such as indoor environments and remote mountainous areas with sparse population,the demand for location-based services has become more and more prominent.However,because GPS signals in these areas are blocked or the signal base stations are difficult to cover,GPS location can not working properly.Image based geo-location refers to determine the location of an image based only on visual information.Without any prior knowledge,predicting the geographic location of a photo is a very difficult task,because the images taken from the earth will show huge changes with different weather,objects or camera settings.This paper attempts to explore the cross-view geo-localization method.First,the inverse polar coordinate transformation is used to convert the street view perspective to the spatial perspective image,so as to reduce the domain gap between the two.Then deep learning is used to encode images from different perspectives to obtain more robust global vector descriptors.Finally,performing image matching on this basis.In the aspect of image feature extraction,the VGG16 model is adopted,and a smaller convolution kernel with deeper layers is used to increase the perception field of the network model and save parameters.In terms of feature encoding,the multi-scale attention mechanism is integrated into the NetVLAD model,and the features extracted from the backbone model are encoded into a more robust global feature descriptor vector.Experimental results show that the above-mentioned method can achieve higher accuracy,compared with the existing methods.And without the high-definition street view captured by professional equipment,the street view captured by ordinary smart phones can obtain good matching accuracy.
作者
刘旭东
余平
LIU Xudong;YU Ping(Wudong Colliery,CHN ENERGY,Urumqi 830000,China;Chn Energy Network Infomation Technology,Co.,Ltd.,Beijing 100011,China)
出处
《计算机科学》
CSCD
北大核心
2023年第S02期395-401,共7页
Computer Science