摘要
针对用于位姿估计的RGB-D特征提取网络规模过于庞大的问题,提出一种轻量化改进XYZNet的RGB-D特征提取网络。首先设计一种轻量级子网络BaseNet以替换XYZNet中的ResNet18,使得网络规模显著下降的同时获得更强大的性能;然后基于深度可分离卷积设计一种多尺度卷积注意力子模块Rep-MSCA(re-parameterized multi-scale convolutional attention),加强BaseNet提取不同尺度上下文信息的能力,并约束模型的参数量;最后,为了以较小的参数代价提升XYZNet中PointNet的几何特征提取能力,设计一种残差多层感知器模块Rep-ResP(re-parameterized residual multi-layer perceptron)。改进后的网络浮点计算量与参数量分别降低了60.8%和64.8%,推理速度加快了21.2%,在主流数据集LineMOD与YCB-Video上分别取得了0.5%与0.6%的精度提升。改进后的网络更适宜在硬件资源紧张的场景下部署。
According to the problem of the current RGB-D feature extraction network used for pose estimation is too large,this paper proposed a lightweight improved XYZNet RGB-D feature extraction network.Firstly,this paper designed a lightweight sub-network BaseNet to replace ResNet18 in XYZNet,which made the network scale significantly reduced and obtained more powerful performance.Then,this paper proposed a re-parameterized multi-scale convolutional attention(Rep-MSCA)sub-module based on depth separable convolution,which enhanced the ability of BaseNet to extract contextual information of different scales,and constrained the amount of parameters in the model.Finally,in order to improve the geometric feature extraction ability of PointNet in XYZNet with a small parameter cost,this paper designed a re-parameterized residual multi-layer perceptron(Rep-ResP)module.The floating point operations(FLOPs)and parameters of the improved network are 60.8%and 64.8%lower,the inference speed is 21.2%higher,and the accuracy of the mainstream datasets LineMOD and YCB-Video is 0.5%and 0.6%higher.The proposed model is more suitable for deployment in scenarios where hardware resources are tight.
作者
于建均
刘耕源
于乃功
龚道雄
冯新悦
Yu Jianjun;Liu Gengyuan;Yu Naigong;Gong Daoxiong;Feng Xinyue(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China)
出处
《计算机应用研究》
CSCD
北大核心
2024年第2期616-622,共7页
Application Research of Computers
基金
国家自然科学基金资助项目(62076014)
北京市教育委员会科技计划重点资助项目(KZ202010005004)。