期刊文献+

多尺度输入3D卷积融合双流模型的行为识别方法 被引量:10

Multi-scale 3D Convolution Fusion Two-Stream Networks for Action Recognition
下载PDF
导出
摘要 基于视频的行为识别技术在计算机视觉领域有广泛的应用.针对当前存在的网络模型不能有效结合视频数据中的时空信息,并且缺乏对不同尺度数据之间的融合信息进行考虑等问题,提出一种结合双流网络以及3D卷积神经网络的多尺度输入3D卷积融合双流模型.首先利用2D残差网以及多尺度输入3D卷积融合网络获取视频中的时空维度信息;然后将2层网络得到的实验结果进行决策相加,有效地提升网络对视频中时空特征提取的能力;最后通过在多尺度输入3D卷积融合网络对不同尺度的数据进行不同策略的融合,提高了网络对不同尺度数据的泛化能力.实验结果表明,文中模型在数据集UCF-101以及HMDB-51的识别准确率分别为90.5%与66.3%;相比于其他方法,该模型能取得更高的识别精度,体现出文中方法的优越性与鲁棒性. Action recognition technology based on videos has been widely used in the field of computer vision.The existing networks cannot effectively combine the spatio-temporal information of video data and lacks consideration of fusion information between different scale data.This paper proposes a multi-scale 3D convolution fusion two-stream network that combines the two-stream network and the 3D convolution neural network.Firstly,the spatial and temporal dimension information of videos are obtained by using 2D residual networks and multi-scale 3D convolution fusion networks.Then,experimental results of the two networks are combined with fusion,to effectively improve the ability of the network to extract the spatio-temporal features of videos.Finally,the generalization ability of the network to different scale data is improved by the fusion of different strategies in multi-scale 3D convolution fusion network.The model was experimented and test in the data set of UCF-101 and HMDB-51,the experimental results were 90.5%and 66.3%,compared with other algorithms,the proposed model can achieve higher recognition accuracies and embody the superiority and the robustness of the algorithm.
作者 宋立飞 翁理国 汪凌峰 夏旻 Song Lifei;Weng Liguo;Wang Lingfeng;Xia Min(Institute of Information and Control,Nanjing University of Information Science and Technology,Nanjing 210044;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190)
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2018年第11期2074-2083,共10页 Journal of Computer-Aided Design & Computer Graphics
基金 国家自然科学基金(61503192 61773377) 江苏省自然科学基金(BK20161533)
关键词 行为识别 3D卷积 深度学习 多尺度输入 信息融合 action recognition 3D convolution deep learning multi-scale input fusion of information
  • 相关文献

参考文献4

二级参考文献152

  • 1Mokhber A,Achard C,Milgram M. Recognition of Human Behavior by Space-Time Silhouette Characterization[J].Pattern Recognition Let-ters,2008,(01):81-89.
  • 2Polat E,Yeasin M,Sharma R. Robust Tracking of Human Body Parts for Collaborative Human Computer Interaction[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2003,(01):44-69.
  • 3Kjellstr?m H,Romero J,Kragic' D. Visual Object-Action Recogni-tion:Inferring Object Affordances from Human Demonstration[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2011,(01):81-90.
  • 4Suma E A,Krum D M,Lange B. Adapting User Interfaces for Gestural Interaction with the Flexible Action and Articulated Skele-ton Toolkit[J].Computers& Graphics,2012,(03):193-201.
  • 5Ayers D,Shah M. Monitoring Human Behavior from Video Taken in an Office Environment[J].{H}IMAGE AND VISION COMPUTING,2001,(12):833-846.
  • 6López M T,Fernández-Caballero A,Fernández M A. Visual Surveillance by Dynamic Visual Attention Method[J].Pattern Recogni-tion,2006,(11):2194-2211.
  • 7Aggarwal J K,Park S. Human Motion:Modeling and Recognition of Actions and Interactions[A].Thessaloniki,Greece,2004.640-647.
  • 8Moeslund T B,Hilton A,Krüger V. A Survey of Advances in Vision-Based Human Motion Capture and Analysis[J].{H}COMPUTER VISION AND IMAGE UNDERSTANDING,2006,(2/3):90-126.
  • 9Poppe R. A Survey on Vision-Based Human Action Recognition[J].{H}IMAGE AND VISION COMPUTING,2010,(06):976-990.
  • 10Weinland D,Ronfard R,Boyer E. A Survey of Vision-Based Meth-ods for Action Representation,Segmentation and Recognition[J].Com-puter Vision and Image Understanding,2011,(02):224-241.

共引文献164

同被引文献63

引证文献10

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部