摘要
在视频人体行为识别研究中三维卷积神经网络和双流卷积神经网络均存在不足.针对此种情况,文中提出结合双流网络架构和三维网络架构的复合型深度神经网络.在双流架构的时间流子网络和空间流子网络部分均采用改进的R(2+1)D卷积神经网络,分别从视频的RGB图像序列和光流图像序列中学习行为表示和分类方法,并融合时间流子网络、空间流子网络的分类结果.进一步地,在网络训练过程中,提出基于梯度中心化算法改进的带动量的随机梯度下降算法,在不改变网络结构的情况下提高网络的泛化性能.实验表明,文中网络在UCF101、HMDB51数据集上均获得较高的识别精度.
Aiming at the deficiencies of 3D convolutional neural network and two-stream convolutional neural network for human activities recognition in video,a composite deep neural network combining two-stream convolutional network and 3D convolutional network is proposed.The improved residual(2+1)D convolutional neural network is utilized in both the temporal sub-network and the spatial sub-network of two-stream architecture.Behavioral representation and classification methods are learned from RGB and optical flow of video,respectively.The classification results of temporal stream and spatial stream sub-networks are combined.Furthermore,in the process of network training,stochastic gradient descent with the momentum improved by gradient centralization algorithm is proposed to improve the network generalization performance without varying the network structure.Experimental results show that the proposed network achieves higher accuracy on UCF101 and HMDB51.
作者
黄敏
尚瑞欣
钱惠敏
HUANG Min;SHANG Ruixing;QIAN Huimin(College of Energy and Electrical Engineering,Hohai University,Nanjing 211100)
出处
《模式识别与人工智能》
EI
CSCD
北大核心
2022年第6期562-570,共9页
Pattern Recognition and Artificial Intelligence
关键词
人体行为识别
双流卷积网络
三维卷积神经网络
梯度中心化
Human Activity Recognition
Two-Stream Convolutional Network
3D Convolution Neural Network
Gradient Centralization