摘要
随着计算机视觉技术的不断迭代和发展,以计算机视觉技术为核心的智能应用和设备逐渐在人们的日常生活和工作中扮演越来越重要的角色。其中,基于视觉的同步定位与建图技术(Simultaneous Localization and Mapping,SLAM)在机器人、无人机、自动驾驶等领域中被广泛应用,上述领域需要视觉SLAM技术为其提供精准的定位信息,以实现其精确建图和自主导航功能。然而,由于视觉SLAM算法本身的特性,计算量极大,数据依赖性极高,导致其在传统的硬件平台(CPU或GPU)上运行时,难以满足前述边缘端应用场景对实时性和低功耗的需求,成为限制视觉SLAM技术被广泛应用的关键因素。为了解决这一问题,本文基于算法与硬件协同设计的优化策略,针对ORB特征提取和匹配算法提出了一种面向视觉SLAM的高能效专用加速器,通过多种硬件设计技术提高计算性能和能效,包括基于数据依赖关系解耦的多层次并行计算技术、基于多尺寸存储桶的数据存储技术和像素级对称-轻量化描述子生成和方向计算策略。提出的视觉SLAM加速器在Xilinx ZCU104上进行了测试和验证。与ORB SLAM2的算法精度对比,本加速器的精度在5%以内,帧率提升至108 fps,与同期其他硬件加速器相比,查找表使用降低了32.7%,FF使用降低了41.17%,同时帧率提升了1.4倍和0.74倍。
With the continuous iteration and development of computer vision technology,intelligent applications and devices centered on computer vision are increasingly playing a crucial role in daily life and work.Among these,visual Simultaneous Localization and Mapping(SLAM)technology finds extensive applications in fields such as robotics,drones,and autonomous driving.These fields critically rely on visual SLAM to provide accurate localization information for precise mapping and autonomous navigation.However,due to the inherent characteristics of visual SLAM algorithms,which involve high computational complexity and significant data dependency,traditional hardware platforms(CPU or GPU)struggle to meet the real-time and low-power requirements of edge applications.This limitation has become a key obstacle to the widespread adoption of visual SLAM.To address this issue,this paper proposes a high-efficiency domain-specific accelerator for ORB feature extraction in SLAM,designed through a co-optimization strategy of algorithms and hardware.Various hardware design techniques are employed to enhance computational performance and energy efficiency,include multi-level parallel computing based on decoupling data dependencies,data storage technology based on multi-size buckets,and pixel-level symmetric lightweight descriptor generation and direction calculation strategies.The proposed visual SLAM accelerator was tested and verified on the Xilinx ZCU104.Compared to the algorithm accuracy of ORB-SLAM2,the accuracy of this accelerator is within 5%,and the frame rate has increased to 108 fps.When compared to other hardware accelerators of the same period,the lookup table usage is reduced by 32.7%,the flip-flop(FF)usage is reduced by 41.17%,while the frame rate is increased by 1.4x and 0.74x.
作者
齐修远
刘野
郝爽
周军
QI Xiuyuan;LIU Ye;HAO Shuang;ZHOU Jun(School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
出处
《集成电路与嵌入式系统》
2024年第11期51-59,共9页
Integrated Circuits and Embedded Systems