期刊文献+

用于动作识别的双流自适应注意力图卷积网络 被引量:4

Two-Stream Adaptive Attention Graph Convolutional Networks for Action Recognition
下载PDF
导出
摘要 人体动作识别因在公共安全方面具有重要的作用而在计算机视觉领域备受关注。然而,现有的图卷积网络在融合多尺度节点的邻域特征时,通常采用各阶邻接矩阵直接相加的方法,各项重要性一致,难以聚焦于重要特征,不利于最优节点关系的建立,同时采用对不同模型的预测结果求平均的双流融合方法,忽略了潜在数据的分布差异,融合效果欠佳。为此,文中提出了一种双流自适应注意力图卷积网络,用于对人体动作进行识别。首先,设计了能自适应平衡权重的多阶邻接矩阵,使模型聚焦于更加重要的邻域;然后,设计了多尺度的时空自注意力模块及通道注意力模块,以增强模型的特征提取能力;最后,提出了一种双流融合网络,利用双流预测结果的数据分布来决定融合系数,提高融合效果。该算法在NTU RGB+D的跨主体和跨视角两个子数据集上的识别准确率分别达92.3%和97.5%,在Kinetics-Skeleton数据集上的识别准确率达39.8%,均高于已有算法,表明了文中算法对于人体动作识别的优越性。 Human action recognition has received much attention in the field of computer vision because of its important role in public safety.However,when fusing the neighborhood features of multi-scale nodes,existing graph convolutional networks usually adopt a direct summation method,in which the same importance is attached to each feature,so it is difficult to focus on important features and is not conducive to the establishment of optimal nodal relationships.In addition,the two-stream fusion method,which averages the prediction results of different models,ignores the potential data distribution differences and the fusion effect is not good.To this end,this paper proposed a two-stream adaptive attention graph convolutional network for human action recognition.Firstly,a multi-order adjacency matrix that adaptively balances the weights was designed to focus the model on more important domains.Secondly,a multi-scale spatio-temporal self-attention module and a channel attention module were designed to enhance the feature extraction capability of the model.Finally,a two-stream fusion network was proposed to improve the fusion effect by using the data distribution of the two-stream prediction results to determine the fusion coefficients.On the two subdatasets of cross subject and cross view of NTU RGB+D,the recognition accuracy of the algorithm is 92.3%and 97.5%,respectively;while on the Kinetics-Skeleton dataset,it reaches 39.8%,both of which are higher than the existing algorithms,indicating the superiority of the algorithm in human motion recognition.
作者 杜启亮 向照夷 田联房 余陆斌 DU Qiliang;XIANG Zhaoyi;TIAN Lianfang;YU Lubin(School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China;China-Singapore International Joint Research Institute,South China University of Technology,Guangzhou 510555,Guangdong,China;Key Laboratory of Autonomous Systems and Network Control of the Ministry of Education,South China University of Technology,Guangzhou 510640,Guangdong,China;Research Institute of Modern Industrial Innovation,South China University of Technology,Zhuhai 519170,Guangdong,China)
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2022年第12期20-29,共10页 Journal of South China University of Technology(Natural Science Edition)
基金 广东省海洋经济发展专项(GDNRC[2020]018) 广东省重点领域研发计划项目(2019B020214001,2018B010109001) 广州市产业技术重大攻关计划项目(2019-01-01-12-1006-0001) 华南理工大学中央高校基本科研业务费专项资金资助项目(2018KZ05) 华南理工大学研究生教育改革项目(zysk2018005)。
关键词 动作识别 图卷积网络 邻接矩阵 注意力 双流融合 action recognition graph neural network adjacency matrix attention two-stream fusion
  • 相关文献

参考文献2

二级参考文献55

  • 1Fujiyoshi H, Lipton A J, Kanade T. Real-time human mo- tion analysis by image skeletonization. IEICE Transactions on Information and Systems, 2004, 87-D(1): 113-120.
  • 2Chaudhry R, Ravichandran A, Hager G, Vidal R. His- tograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of hu- man actions. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1932-1939.
  • 3Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Con- ference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005. 886-893.
  • 4Lowe D G. Object recognition from local scale-invariant fea- tures. In: Proceedings of the 7th IEEE International Confer- ence on Computer Vision. Kerkyra: IEEE, 1999. 1150-1157.
  • 5Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th In- ternational Conference on Pattern Recognition. Cambridge: IEEE, 2004. 32-36.
  • 6Dollar P, Rabaud V, Cottrell G, Belongie S. Behavior recog- nition via sparse spatio-temporal features. In: Proceedings of the 2005 IEEE International Workshop on Visual Surveil- lance and Performance Evaluation of Tracking and Surveil- lance. Beijing, China: IEEE, 2005.65-72.
  • 7Rapantzikos K, Avrithis Y, Kollias S. Dense saliency-based spatiotemporal feature points for action recognition. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1454-1461.
  • 8Knopp J, Prasad M, Willems G, Timofte R, Van Gool L. Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of the llth European Confer- ence on Computer Vision (ECCV 2010). Berlin Heidelberg: Springer. 2010. 589-602.
  • 9Klaser A, Marszaeek M, Schmid C. A spatio-temporal de- scriptor based on 3D-gradients. In: Proceedings of the 19th British Machine Vision Conference. Leeds: BMVA Press, 2008. 99.1-99.10.
  • 10Wang H, Ullah M M, Klaser A, Laptev I, Schmid C. Evalua- tion of local spatio-temporal features for action recognition. In: Proceedings of the 2009 British Machine Vision Confer- ence. London, UK: BMVA Press, 2009. 124.1-124.11.

共引文献148

同被引文献38

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部