摘要
深度学习需要充分利用视频中动作的时空信息来进行动作识别。为了充分利用视频中的时空特征来提高动作识别的准确率,并以较低的成本保存相关信息,提出一种采用稀疏采样方案的时空特征融合动作识别框架。采用稀疏采样获得视频的RGB图和光流图,分别送入VGG-16网络提取视频的时空特征;融合时空卷积神经网络(CNN)提取中层时空融合特征;将中层时空融合特征送入C3D CNN识别出动作的类别。在HMDB51和UCF101两个数据集的实验结果表明:该框架能够充分利用视频的时间信息和空间信息,达到了较高的动作识别准确率。
Deep learning needs to make full use of the spatio-temporal information of the actions in the video to perform action recognition.In order to make full use of the spatio-temporal features in video to improve the accuracy of action recognition and save relevant information at a lower cost,a spatio-temporal feature fusion action recognition framework using sparse sampling scheme is proposed.The framework uses the sparse sampling to obtain the RGB images and optical flow images of videos,and respectively sends the spatio-temporal features to the VGG-16 network to extract the spatio-temporal features,then the spatial CNN and the temporal CNN are merged to extract the fused spatio-temporal features middle level,finally the fused spatio-temporal features are sent to the C3D CNN to performs action recognition.The experimental results of two datasets in HMDB51 and UCF101 show that the framework can make full use of the temporal information and spatial information of the video to achieve higher action recognition accuracy.
作者
王倩
孙宪坤
范冬艳
WANG Qian;SUN Xiankun;FAN Dongyan(School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
出处
《传感器与微系统》
CSCD
2020年第10期35-38,共4页
Transducer and Microsystem Technologies
基金
国家自然科学基金青年科学基金资助项目(61802251,61801286)
上海市科学技术委员会科研计划项目(16DZ1206000)
上海工程技术大学科研项目(E3-0903-19-01053)。