摘要
针对现有人体动作识别方法需输入固定长度的视频段、未充分利用时空信息等问题,提出一种基于时空金字塔和注意力机制相结合的深度神经网络模型,将包含时空金字塔的3D-CNN和添加时空注意力机制的LSTM模型相结合,实现了对视频段的多尺度处理和对动作的复杂时空信息的充分利用。以RGB图像和光流场作为空域和时域的输入,以融合金字塔池化层的运动和外观特征后的融合特征作为融合域的输入,最后采用决策融合策略获得最终动作识别结果。在UCF101和HMDB51数据集上进行实验,分别取得了94.2%和70.5%的识别准确率。实验结果表明,改进的网络模型在基于视频的人体动作识别任务上获得了较高的识别准确率。
Aiming at the problem that the existing human motion recognition method needed to input a fixed length video segment and underutilized the spatio-temporal information,this paper proposed a deep neural network model based on the combination of space-time pyramid and attention mechanism.This improved architecture combined 3D-CNN including spatio-temporal pyramids with LSTM model with spatio-temporal attention mechanism,and realized multi-scale processing of video segments and full utilization of complex spatio-temporal information of actions.For the architecture,the inputs of spatial and temporal domain were RGB image and the optical flow,the input of the fusion domain was the fusion feature of the motion and appearance features of the pyramid pooling layer.Finally,it used the decision fusion strategy to obtain the final motion recognition result.Experiments were performed on the UCF101 and HMDB51 datasets,it achieved 94.2%and 70.5%recognition accuracy,respectively.The experimental results show that the improved network model achieves high recognition accuracy in video based human motion recognition tasks.
作者
何冰倩
魏维
张斌
高联欣
宋岩贝
He Bingqian;Wei Wei;Zhang Bin;Gao Lianxin;Song Yanbei(School of Computer Science,Chengdu University of Information Technology,Chengdu 610225,China;School of Software Engineering,Chengdu University of Information Technology,Chengdu 610225,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第10期3107-3111,共5页
Application Research of Computers
基金
四川省教育厅重点科研资助项目(17ZA0064)
关键词
动作识别
深度学习
时空金字塔
注意力机制
卷积神经网络
action recognition
deep learning
spatio-temporal pyramid
attention module
convolutional neural network