摘要
在人机交互领域中,大多数手势识别算法无法有效地消除采集背景对待提取手势区域的影响。与此同时,对手势运动信息的准确建模也存在困难。针对目前人机交互中的上述问题,提出利用深度可分离残差卷积长短期记忆(LSTM)网络的方法对动态手势的特征信息进行建模和识别。首先,利用常规3D卷积操作对输入的视频帧进行特征的初步提取,通过较大的卷积核尺寸以扩大其感受野;然后,通过可分离卷积残差操作对输入的浅层特征进行特征的再提取,实现对高维特征的提取建模;最后,将经过前两个阶段提取出的特征信息经过3D池化操作后输入到LSTM网络中,对输入的视频数据的时序信息进行建模,并在输入中引入注意力机制。在大规模孤立手势数据集上进行的相关实验结果表明,所提方法的准确率与原始的围绕稀疏关键点的混合特征(MFSK)+视觉词袋(BoVW)+支持向量机(SVM)网络相比提高了21.02个百分点。
Most gesture recognition algorithms in the human-computer interaction field cannot effectively eliminate the influence of the acquisition background on the extraction gesture area.At the same time,it is difficult to accurately model the motion information of the gesture.In view of the above problems in human-computer interaction,separable Long Short-Term Memory(LSTM)network for gesture recognition was proposed to model and recognize the feature information.First,the preliminary extraction of the input video frame by conventional 3D convolution operation was carried out.A large convolutional size was chosen to expand the receptive field.Then,the shallow features were re-extracted with separable convolutional residual operation and constructed the model of high-dimensional features.Finally,the feature information extracted through the first two steps was entered into a LSTM network after 3D pooling.The timing information of the video data was modeled,and attention mechanism was introduced into the input.Experimental results on a large-scale isolated gesture dataset show that the accuracy of the proposed method is 21.02 percentage points higher than that of the original MFSK(Mixed features around Sparse Keypoints)+BoVW(Bag of Visual Words)+SVM(Support Vector Machine)network.
作者
顾明
李轶群
张二超
张训雷
齐林
帖云
GU Ming;LI Yiqun;ZHANG Erchao;ZHANG Xunlei;QI Lin;TIE Yun(Henan Communications Investment Group Company Limited,Zhengzhou Henan 450016,China;Zhengzhou Branch,Zhongxun Post&Telecommunication Consulting&Design Institute Company Limited,Zhengzhou Henan 450000,China;School of Information Engineering,Zhengzhou University,Zhengzhou Henan 450001,China)
出处
《计算机应用》
CSCD
北大核心
2022年第S01期59-63,共5页
journal of Computer Applications
关键词
深度残差网络
可分离卷积
长短期记忆网络
动态手势识别
注意力机制
deep residual network
separable convolution
Long Short-Term Memory(LSTM)network
dynamic gesture recognition
attention mechanism