期刊文献+

结合目标检测的人体行为识别 被引量:18

Human Action Recognition Combined With Object Detection
下载PDF
导出
摘要 人体行为识别领域的研究方法大多数是从原始视频帧中提取相关特征,这些方法或多或少地引入了多余的背景信息,从而给神经网络带来了较大的噪声.为了解决背景信息干扰、视频帧存在的大量冗余信息、样本分类不均衡及个别类分类难的问题,本文提出一种新的结合目标检测的人体行为识别的算法.首先,在人体行为识别的过程中增加目标检测机制,使神经网络有侧重地学习人体的动作信息;其次,对视频进行分段随机采样,建立跨越整个视频段的长时时域建模;最后,通过改进的神经网络损失函数再进行行为识别.本文方法在常见的人体行为识别数据集UCF101和HMDB51上进行了大量的实验分析,人体行为识别的准确率(仅RGB图像)分别可达96.0%和75.3%,明显高于当今主流人体行为识别算法. Most of the research methods in the field of human action recognition extract relevant features from the original video frames.These methods introduce more or less redundant background information,which brings more noise to the neural network.In order to solve the problem of background information interference,large amount of redundant information in video frames,unbalanced sample classification and difficult classification of individual classes,this paper proposes a new algorithm for human action recognition combined with object detection.Firstly,the object detection mechanism is added in the process of human action recognition,so that the neural network has a focus on learning the motion information of the human body.Secondly,the video is segmentally and randomly sampled to establish long-term time domain modeling across the entire video segment.Finally,action recognition is performed through an improved neural network loss function.In this work,a large number of experimental analyses are performed on the popular human action recognition datasets UCF101 and HDBM51.The accuracy of human action recognition(RGB images only)is 96.0%and 75.3%,respectively,which is significantly higher than the state-of-the-art human action recognition algorithms.
作者 周波 李俊峰 ZHOU Bo;LI Jun-Feng(Institute of Automation,Faculty of Mechanical Engineering and Automation,Zhejiang Sci-Tech University,Hangzhou 310018)
出处 《自动化学报》 EI CSCD 北大核心 2020年第9期1961-1970,共10页 Acta Automatica Sinica
基金 国家自然科学基金(61374022) 浙江省基础公益研究计划项目(LGG18F030001) 金华市科学技术研究计划重点项目(2018-1-027)资助。
关键词 深度学习 行为识别 卷积神经网络 机器视觉 目标检测 Deep learning action recognition convolutional neural network(CNN) computer vision object detection
  • 相关文献

参考文献2

二级参考文献54

  • 1Fujiyoshi H, Lipton A J, Kanade T. Real-time human mo- tion analysis by image skeletonization. IEICE Transactions on Information and Systems, 2004, 87-D(1): 113-120.
  • 2Chaudhry R, Ravichandran A, Hager G, Vidal R. His- tograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of hu- man actions. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1932-1939.
  • 3Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Con- ference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005. 886-893.
  • 4Lowe D G. Object recognition from local scale-invariant fea- tures. In: Proceedings of the 7th IEEE International Confer- ence on Computer Vision. Kerkyra: IEEE, 1999. 1150-1157.
  • 5Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th In- ternational Conference on Pattern Recognition. Cambridge: IEEE, 2004. 32-36.
  • 6Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recog- nition via sparse spatio-temporal features. In: Proceedings of the 2005 IEEE International Workshop on Visual Surveil- lance and Performance Evaluation of Tracking and Surveil- lance. Beijing, China: IEEE, 2005.65-72.
  • 7Rapantzikos K, Avrithis Y, Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1454-1461.
  • 8Knopp J, Prasad M, Willems G, Timofte R, Van Gool L. Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of the llth European Confer- ence on Computer Vision (ECCV 2010). Berlin Heidelberg: Springer. 2010. 589-602.
  • 9Klaser A, Marszaeek M, Schmid C. A spatio-temporal de- scriptor based on 3D-gradients. In: Proceedings of the 19th British Machine Vision Conference. Leeds: BMVA Press, 2008. 99.1-99.10.
  • 10Wang H, Ullah M M, Klaser A, Laptev I, Schmid C. Evalua- tion of local spatio-temporal features for action recognition. In: Proceedings of the 2009 British Machine Vision Confer- ence. London, UK: BMVA Press, 2009. 124.1-124.11.

共引文献161

同被引文献106

引证文献18

二级引证文献98

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部