摘要
近年来深度卷积神经网络在静态图像识别上取得了较大进展,但在行为视频上建模运动信息的能力较弱。但是,运动信息是行为识别区别于静态图像识别的关键。基于滤波器响应积提出了时空域深度卷积神经网络。该网络先将相邻帧对应的卷积核分为两组,近似地形成傅里叶基函数对,后续的乘法层将不同帧产生的响应两两相乘后再输入加法层求和,从而将相邻帧映射到变换矩阵的特征值对应的不变子空间上,依靠相邻帧在不变子空间上的旋转角度检测它们之间的运动特征。理论分析证明,网络既对运动敏感,又对内容敏感。实验表明,该网络能对行为视频做出更准确的分类,并与近年出现的其他6种算法进行比较,结果体现了本算法的优越性。
The key thing that distinguishes action recognition from other recognition tasks is to encode motion explicitly.But,so far,most works based on convolutional neural networks(CNN)cannot properly handle the spatiotemporal interaction in video.We developed a spatiotemporal-CNN that explicitly exploits this important cue provided by video.Instead of summing filter responses,responses are multiplied and our approach is based on that.Specifically,the spatiotemporal-CNN divides convolutional kernels into two groups forming sinusoidals of Fourier Transform.Then,the responses of convolutional kernels are multiplied by multiplicative layer as calculating covariance and the outputs are put into sum layer.In this way,the inputs and adjacent frames are mapped into the subspaces spanned by the eigenvectors,and the special geometric transformations or motion features can be checked by the rotating angles in that space.Additional theoretical analysis proves that spatiotemporal-CNN is sensitive to both motion and content.The experiment shows that our approach produces more accurate classification than current algorithms.
出处
《计算机科学》
CSCD
北大核心
2015年第7期245-249,共5页
Computer Science
关键词
时空域
卷积神经网络
深度学习
动作特征
行为识别
Spatiotemporal
Convolutional neural networks
Deep learning
Motion feature
Action recognition