期刊文献+

融合空间-时间双网络流和视觉注意的人体行为识别 被引量:12

Human Action Recognition via Spatio-temporal Dual Network Flow and Visual Attention Fusion
下载PDF
导出
摘要 该文受人脑视觉感知机理启发,在深度学习框架下提出融合时空双网络流和视觉注意的行为识别方法。首先,采用由粗到细Lucas-Kanade估计法逐帧提取视频中人体运动的光流特征。然后,利用预训练模型微调的GoogLeNet神经网络分别逐层卷积并聚合给定时间窗口视频中外观图像和相应光流特征。接着,利用长短时记忆多层递归网络交叉感知即得含高层显著结构的时空流语义特征序列;解码时间窗口内互相依赖的隐状态;输出空间流视觉特征描述和视频窗口中每帧标签概率分布。其次,利用相对熵计算时间维每帧注意力置信度,并融合空间网络流感知序列标签概率分布。最后,利用softmax分类视频中行为类别。实验结果表明,与其他现有方法相比,该文行为识别方法在分类准确度上具有显著优势。 Inspired by the mechanism of human brain visual perception, an action recognition approach integrating dual spatio-temporal network flow and visual attention is proposed in a deep learning framework. First, the optical flow features with body motion are extracted frame-by-frame from video with coarse-to-fine Lucas-Kanade flow estimation. Then, the GoogLeNet neural network with fine-tuned pre-trained model is applied to convoluting layer-by-layer and aggregate respectively appearance images and the related optical flow features in the selected time window. Next, the multi-layered Long Short-Term Memory (LSTM) neural networks are exploited to cross-recursively perceive the spatio-temporal semantic feature sequences with high level and significant structure. Meanwhile, the inter-dependent implicit states are decoded in the given time window, and the attention salient feature sequence is obtained from temporal stream with the visual feature descriptor in spatial stream and the label probability of each frame. Then, the temporal attention confidence for each frame with respect to human actions is calculated with the relative entropy measure and fused with the probability distributions with respect to the action categories from the given spatial perception network stream in the video sequence. Finally, the softmax classifier is exploited to identify the category of human action in the given video sequence. Experimental results show that this presented approach has significant advantages in classification accuracy compared with other methods.
作者 刘天亮 谯庆伟 万俊伟 戴修斌 罗杰波 LIU Tianliang1,QIAO Qingwei1, WAN Junwei1, DAI Xiubin1,LUO Jiebo2(1.Jiangsu Provincial Key Laboratory of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003, China;2.Department of Computer Science, University of Rochester, Rochester 14627, US)
出处 《电子与信息学报》 EI CSCD 北大核心 2018年第10期2395-2401,共7页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61001152 31200747 61071091 61071166 61172118) 江苏省自然科学基金(BK2012437) 南京邮电大学校级科研基金(NY214037) 国家留学基金~~
关键词 人体行为识别 光流 双重时空网络流 视觉注意力 卷积神经网络 长短时记忆神经网络 Human action recognition Optical flow Spatio-temporal dual network flow Visual attention Convolution Neural Network (CNN) Long Short-Term Memory (LSTM)
  • 相关文献

参考文献1

二级参考文献17

  • 1BEBAR A A and HEMAYED E E. Comparative study for feature detector in human activity recognition[C]. IEEE the9th International conference on Computer Engineering Conference, Giza, 2013: 19-24. doi: 10.1109/ICENCO.2013. 6736470.
  • 2LI F and DU J X. Local spatio-temporal interest point detection for human action recognition[C]. IEEE the 5th International Conference on Advanced Computational Intelligence, Nanjing, 2012: 579-582. doi: 10.1109/ICACI. 2012.6463231.
  • 3ONOFRI L, SODA P, and IANNELLO G. Multiple subsequence combination in human action recognition[J]. IEEE Journal on Computer Vision, 2014, 8(1): 26-34. doi: 10.1049/iet-cvi.2013.0015.
  • 4FOGGIA P, PERCANNELLA G, SAGGESE A, et al. Recognizing human actions by a bag of visual words[C]. IEEE International Conference on Systems, Man, and Cybernetics~ Manchester, 2013: 2910-2915. doi: 10.1109/SMC.2013.496.
  • 5ZHANG X, MIAO Z J, and WAN L. Human action categories using motion descriptors[C]. IEEE 19th International Conference on hnage Processing, Orlando, FL, 2012: 1381-1384. doi: 10.1109/ICIP.2012.6467126.
  • 6LI Y and KUAI Y H. Action recognition based on spatio-temporal interest point[C]. IEEE the 5th International.
  • 7Conference on Biomedical Engineering and Informatics, Chongqing, 2012: 181-185. doi: 10.1109/BMEI.2012.6512972.
  • 8REN H and MOSELUND T B. Action recognition using salient neighboring histograms[C]. IEEE the 20th International Conference on Image Processing, Melbourne, VIC, 2013: 2807-2811. doi: 10.1109/ICIP.2013.6738578.
  • 9COZAR J R, GONZALEZ-LINARES J M, GUIL N, et al. Visual words selection for human action classification[C]. International Conference on High Performance Computing and Simulation, Madrid, 2012: 188-194. doi: 10.1109/ HPCSim.2012.6266910.
  • 10WANG H R, YUAN C F, HU W M, et al. Action recognition using nonnegative action component representation and sparse basis selection[J]. IEEE Transactions on Image Processing, 2014, 23(2): 570-581. doi: 10.1109/TIP.2013. 2292550.

共引文献10

同被引文献60

引证文献12

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部