摘要
大多数现有的基于深度学习的手势姿态估计方法都使用标准三维卷积神经网络提取三维特征,估计手部关节坐标。该方法提取的特征缺乏手部的多尺度信息,限制了手势姿态估计的精度。另外,由于三维卷积神经网络巨大的计算成本和内存需求,这些方法常难以满足实时性要求。为了克服这些缺点,提出以空间滤波器和深度滤波器级联的方式模拟三维卷积,减少网络参数量。同时,在各个尺度上提取手势姿态特征并加以整合,充分利用手势的三维信息。实验表明,该方法能有效提高手势姿态估计精度,减小模型尺寸,且在具有单块GPU的计算机上能以超过119 fps的速度运行。
Most of the existing deep learning-based methods for hand pose estimation use a standard three-dimension convolutional neural network(3D CNN)to extract 3D features and estimate the 3D coordinates of hand joints.The features extracted by these methods lack the multi-scale information of the hand,which limits the accuracy of hand pose estimation.In addition,due to the huge computational cost and memory requirements of the 3D CNN,these methods are often difficult to meet the real-time requirement.To overcome these weaknesses,the proposed method used a spatial filter and a depth filter to simulate 3D convolutions,which reduced the amount of parameters.It extracted and integrates features at various scales,making full use of the 3D information of hand pose.Experiments show that this method can improve estimation accuracy,reduce model size,and run at over 119 fps on a standard computer with a single GPU.
作者
张宏源
袁家政
刘宏哲
原春锋
王雪峤
邓智方
Zhang Hongyuan;Yuan Jiazheng;Liu Hongzhe;Yuan Chunfeng;Wang Xueqiao;Deng Zhifang(Beijing Key Laboratory of Information Service Engineering,Beijing Union University,Beijing 100101,China;Beijing Open University,Beijing 100081,China;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第4期1230-1233,1243,共5页
Application Research of Computers
基金
国家自然科学基金资助项目(61571045)
北京成像技术高精尖创新中心项目(BAICIT-2016002)
北京市教委科技计划一般项目(KM201811417002)
北京联合大学研究生资助项目。