期刊文献+

基于骨架模态的多级门控图卷积动作识别网络 被引量:1

Multi-scale Gated Graph Convolutional Network for Skeleton-based Action Recognition
下载PDF
导出
摘要 人类动作识别是一个极具挑战性的研究课题,广泛应用于安全监控、人机交互和自动驾驶等领域。近年来,图卷积网络在建模非欧几里德结构数据上取得了巨大成功,为骨架模态动作识别提供了新思路。由于骨架预定义图包含大量噪声,现有方法多使用高阶空域特征对空间依赖性进行建模。然而,仅关注高阶子集并不能在全局上反映顶点之间的动态相关性。此外,主流方法中模拟时间依赖性使用的卷积神经网络或循环神经网络也无法捕获多范围的时序关系。为了解决这些问题,文中提出了一种基于骨架模态的多级门控图卷积动作识别网络框架。具体地,提出了门控时序卷积模块来提取时域顶点之间的多时期依赖关系;同时,通过多维注意力机制来增强图的全局表征。为了验证所提方法的有效性,在NTU-RGB+D和Kinetics两个大型视频行为识别基准数据集上进行了实验。结果表明,所提方法的性能优于目前最先进的方法。 Skeleton-based human action recognition is attracting more attention in computer vision.Recently,graph convolutional networks(GCNs),which is powerful to model non-Euclidean structure data,have obtained promising performance and enable a new paradigm for action recognition.Existing approaches mostly model the spatial dependency with emphasis mechanism since the huge pre-defined graph contains large quantities of noise.However,simply emphasizing subsets is not optimal for reflecting the dynamic underlying correlations between vertexes in a global manner.Furthermore,these methods are ineffective to capture the temporal dependencies as the CNNs or RNNs are not capable to model the intricate multi-range temporal relations.To address these issues,a multi-scale gated graph convolutional network(MSG-GCN)is proposed for skeleton-based action recognition.Specifically,agated temporal convolution module(G-TCM)is presented to capture the consecutive short-term and interval long-term dependencies between vertexes in the temporal domain.Besides,a multi-dimensional attention module for spatial,temporal,and channel,which enhances the expressiveness of spatial graph,is integrated into GCNs with negligible overheads.Extensive experiments on two large-scale benchmark datasets,NTU-RGB+D and Kinetics,demonstrate that our approach outperforms the stateof-the-art baselines.
作者 干创 吴桂兴 詹庆原 王鹏焜 彭志磊 GAN Chuang;WU Gui-xing;ZHAN Qing-yuan;WANG Peng-kun;PENG Zhi-lei(School of Software Engineering,University of Science and Technology of China,Suzhou,Jiangsu 215000,China;Suzhou Research Institute,University of Science and Technology of China,Suzhou,Jiangsu 215000,China)
出处 《计算机科学》 CSCD 北大核心 2022年第1期181-186,共6页 Computer Science
基金 江苏省自然科学基金(BK20141209)。
关键词 动作识别 骨架模态 图卷积 视频分类 计算机视觉 Action recognition Skeleton modality Graph convolution Video classification Computer vision
  • 相关文献

同被引文献6

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部