期刊文献+

基于多阶信息融合的行为识别方法研究 被引量:9

Multi-order Information Fusion Method for Human Action Recognition
下载PDF
导出
摘要 双流卷积神经网络能够获取视频局部空间和时间特征的一阶统计信息,测试阶段将多个视频局部特征的分类器分数平均作为最终的预测.但是,一阶统计信息不能充分建模空间和时间特征分布,测试阶段也未考虑使用多个视频局部特征之间的更高阶统计信息.针对这两个问题,本文提出一种基于二阶聚合的视频多阶信息融合方法.首先,通过建立二阶双流模型得到视频局部特征的二阶统计信息,与一阶统计信息形成多阶信息.其次,将基于多阶信息的视频局部特征分别进行二阶聚合,形成高阶视频全局表达.最后,采用两种策略融合该表达.实验表明,本文方法能够有效提高行为识别精度,在HMDB51和UCF101数据集上的识别准确率比双流卷积神经网络分别提升了8%和2:1%,融合改进的密集点轨迹(Improved dense trajectory,IDT)特征之后,其性能进一步提升. The classical two-stream convolutional neural network(CNN)can capture the first-order statistics of the local spatial and temporal features from an input video,while making final predictions by averaging the softmax scores of the local video features.However,the first-order statistics can not fully characterize the distribution of the spatial and temporal features,while higher-order information inherent in local features is discarded at the test stage.To solve the two problems above,this paper proposes a multi-order information fusion method for human action recognition.To this end,we first introduce a novel two-stream CNN model for capturing second-order statistics of the local spatial and temporal features,which,together with the original first-order statistics,forms the so-called multi-order information.We perform individually second-order aggregation of these extracted local multi-order information to compute global video representations.Finally,two strategies are proposed to fuse video representations for prediction.The experimental results demonstrate that our proposed method significantly improves recognition accuracy over the original two-stream CNN model,i.e.,8%and 2.1%gains on the HMDB51 and UCF101,respectively.The performance of our method is further improved by combining traditional IDT(improved dense trajectory)features.
作者 张冰冰 葛疏雨 王旗龙 李培华 ZHANG Bing-Bing;GE Shu-Yu;WANG Qi-Long;LI Pei-Hua(School of Information and Communication Engineering,Dalian University of Technology,Dalian 116033;College of Intelligence and Computing,Tianjin University,Tianjin 300350)
出处 《自动化学报》 EI CAS CSCD 北大核心 2021年第3期609-619,共11页 Acta Automatica Sinica
基金 国家自然科学基金(61971086,61806140,61471082)资助。
关键词 行为识别 双流卷积神经网络 多阶信息融合 二阶聚合 Human action recognition two-stream convolutional neural network multi-order information fusion second-order aggregation
  • 相关文献

参考文献3

二级参考文献79

  • 1Fujiyoshi H, Lipton A J, Kanade T. Real-time human mo- tion analysis by image skeletonization. IEICE Transactions on Information and Systems, 2004, 87-D(1): 113-120.
  • 2Chaudhry R, Ravichandran A, Hager G, Vidal R. His- tograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of hu- man actions. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1932-1939.
  • 3Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Con- ference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005. 886-893.
  • 4Lowe D G. Object recognition from local scale-invariant fea- tures. In: Proceedings of the 7th IEEE International Confer- ence on Computer Vision. Kerkyra: IEEE, 1999. 1150-1157.
  • 5Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th In- ternational Conference on Pattern Recognition. Cambridge: IEEE, 2004. 32-36.
  • 6Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recog- nition via sparse spatio-temporal features. In: Proceedings of the 2005 IEEE International Workshop on Visual Surveil- lance and Performance Evaluation of Tracking and Surveil- lance. Beijing, China: IEEE, 2005.65-72.
  • 7Rapantzikos K, Avrithis Y, Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1454-1461.
  • 8Knopp J, Prasad M, Willems G, Timofte R, Van Gool L. Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of the llth European Confer- ence on Computer Vision (ECCV 2010). Berlin Heidelberg: Springer. 2010. 589-602.
  • 9Klaser A, Marszaeek M, Schmid C. A spatio-temporal de- scriptor based on 3D-gradients. In: Proceedings of the 19th British Machine Vision Conference. Leeds: BMVA Press, 2008. 99.1-99.10.
  • 10Wang H, Ullah M M, Klaser A, Laptev I, Schmid C. Evalua- tion of local spatio-temporal features for action recognition. In: Proceedings of the 2009 British Machine Vision Confer- ence. London, UK: BMVA Press, 2009. 124.1-124.11.

共引文献162

同被引文献80

引证文献9

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部