摘要
针对视频中存在噪音,无法更好地获取特征信息,造成动作识别不精准的问题。提出了一种基于时空卷积神经网络的人体行为识别网络。将长时段视频进行分段处理,分别把RGB图片和计算出的光流图输入到两个卷积神经网络(CNN)中,使用权重相加的融合算法将提取的时域特征和空域特征融合成时空特征。形成的中层语义信息输入到R(2+1)D的卷积中,利用ResNet提高网络性能,最后在softmax层进行行行为识别。在UCF-101和HMDB-51数据集上进行实验,获得了92.1%和66.1%的准确率。实验表明,提出的双流融合与时空卷积网络模型有助于视频行为识别的准确率提高。
In view of the noise in the video,it is impossible to better obtain the characteristic information,which causes the problem of inaccurate motion recognition.This paper proposes a human behavior recognition network based on spatio-temporal convolutional neural networks.The long-term video is segmented,and the RGB pictures and the calculated optical flow map are input into two convolutional neural networks(CNN),and the extracted time-domain features and spatial-domain features are fused using a fusion algorithm of weight addition.Into space-time characteristics.The formed middle-layer semantic information is input into the convolution of R(2+1)D,the network performance is improved by using ResNet,and the behavior recognition is performed at the softmax layer.Experiments on UCF-101 and HMDB-51 datasets have obtained 92.1%and 66.1%accuracy.Experiments show that the dual-stream fusion and spatio-temporal convolutional network model proposed in this paper can help improve the accuracy of video behavior recognition.
作者
秦悦
石跃祥
QIN Yue;SHI Yue-xiang(College of Computer and Cyberspace Security,Xiangtan University,Xiangtan,Hunan 411105,China)
出处
《计算技术与自动化》
2021年第2期140-147,共8页
Computing Technology and Automation
基金
国家自然科学基金资助项目(61602397,61502407)。
关键词
深度学习
时空卷积网络
双流融合网络
R(2+1)D
deep learning
spatio-temporal convolutional network
two-stream convolutional networks
R(2+1)D