摘要
针对双流网络提取运动信息需要预先计算光流图,从而无法实现端到端的识别以及三维卷积网络参数量巨大的问题,提出了一种基于视频时空特征的行为识别方法。该方法能够高效提取视频中的时空信息,且无需添加任何光流计算和三维卷积操作。首先,利用基于注意力机制的运动信息提取模块捕获相邻两帧之间的运动位移信息,从而模拟双流网络中光流图的作用;其次,提出了一种解耦的时空信息提取模块代替三维卷积,从而实现时空信息的编码;最后,在将两个模块嵌入二维的残差网络中后,完成端到端的行为识别。将所提方法在几个主流的行为识别数据集上进行实验,结果表明在仅使用RGB视频帧作为输入的情况下,在UCF101、HMDB51、Something-Something-V1数据集上的识别准确率分别为96.5%、73.1%和46.6%,与使用双流结构的时间分段网络(TSN)方法相比,在UCF101数据集上的识别准确率提高了2.5个百分点。可见,所提方法能够高效提取视频中的时空特征。
Aiming at the problems that the end-to-end recognition of two-stream networks cannot be realized due to the need of calculating optical flow maps in advance to extract motion information and the three-dimensional convolutional networks have a lot of parameters,an action recognition method based on video spatio-temporal features was proposed.In this method,the spatio-temporal information in videos were able to be extracted efficiently without adding any calculation of optical flows or any three-dimensional convolution operation.Firstly,the motion information extraction module based on attention mechanism was used to capture the motion shift information between two adjacent frames,thereby simulating the function of optical flows in two-stream network.Secondly,a decoupled spatio-temporal information extraction module was proposed to replace the three-dimensional convolution in order to encode the spatio-temporal information.Finally,the two modules were embedded into the two-dimensional residual network to complete the end-to-end action recognition.Experiments were carried out on several mainstream action recognition datasets.The results show that when only using RGB(Red-Green-Blue)video frames as input,the recognition accuracies of the proposed method on UCF101,HMDB51 and Something-Something-V1 datasets are 96.5%,73.1% and 46.6% respectively.Compared with Temporal Segment Network(TSN)method using two-stream structure,the proposed method has the recognition accuracy on UCF101 improved by 2.5 percentage points.It can be seen that the proposed method is able to extract spatio-temporal features in videos efficiently.
作者
倪苒岩
张轶
NI Ranyan;ZHANG Yi(College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China)
出处
《计算机应用》
CSCD
北大核心
2023年第2期521-528,共8页
journal of Computer Applications
基金
国家自然科学基金资助项目(U20A20161)。
关键词
卷积神经网络
行为识别
时空信息
时序推理
运动信息
Convolutional Neural Network(CNN)
action recognition
spatio-temporal information
temporal reasoning
motion information