时空双仿射微分不变量及骨架动作识别被引量：3

Spatio-temporal dual affine differential invariants for skeleton-based action recognition

导出

摘要目的人体骨架的动态变化对于动作识别具有重要意义。从关节轨迹的角度出发,部分对动作类别判定具有价值的关节轨迹传达了最重要的信息。在同一动作的每次尝试中,相应关节的轨迹一般具有相似的基本形状,但其具体形式会受到一定的畸变影响。基于对畸变因素的分析,将人体运动中关节轨迹的常见变换建模为时空双仿射变换。方法首先用一个统一的表达式以内外变换的形式将时空双仿射变换进行描述。基于变换前后轨迹曲线的微分关系推导设计了双仿射微分不变量,用于描述关节轨迹的局部属性。基于微分不变量和关节坐标在数据结构上的同构特点,提出了一种通道增强方法,使用微分不变量将输入数据沿通道维度扩展后,输入神经网络进行训练与评估,用于提高神经网络的泛化能力。结果实验在两个大型动作识别数据集NTU(Nanyang Technological University)RGB+D(NTU 60)和NTU RGB+D 120(NTU 120)上与若干最新方法及两种基线方法进行比较,在两种实验设置(跨参与者识别与跨视角识别)中均取得了明显的改进结果。相比于使用原始数据的时空图神经卷积网络(spatio-temporal graph convolutional networks,ST-GCN),在NTU 60数据集中,跨参与者与跨视角的识别准确率分别提高了1.9%和3.0%;在NTU 120数据集中,跨参与者与跨环境的识别准确率分别提高了5.6%和4.5%。同时对比于数据增强,基于不变特征的通道增强方法在两种实验设置下都能有明显改善,更为有效地提升了网络的泛化能力。结论本文提出的不变特征与通道增强,直观有效地综合了传统特征和深度学习的优点,有效提高了骨架动作识别的准确性,改善了神经网络的泛化能力。 Objective Skeleton-based action recognition has been concerned in recent years,as the dynamics of human skeletons has significant information for the task of action recognition.The action of human skeletons can be seen as time series of human poses,or the combination of human joint trajectories.The trajectory of important joints indicating the action class has conveyed the most significant information among all the human joints.The trajectories of these joints have been subjected to some distortions when performing the same action under different attempts.In this case,two similar trajectories of corresponding joints should share a basic shape.However,these two trajectories have appeared in diverse kinds of distortions due to individual factors.These distortions have been caused by spatial and temporal factors.Spatial factors have included the change of viewpoints,different skeleton sizes and action amplitudes,while temporal factors indicate time scaling along the time series,denoting the order and speed of performing specific action.All the spatial factors can be modeled by the affine transformation in 3 D space,whereas the uniform time scaling has been commonly discussed case,which can be seen as affine transformation in 1 D space.These two kinds of distortions as the spatio-temporal dual affine transformation have been combined.A novel invariant feature under these distortions has been proposed and utilized for facilitating skeleton-based action recognition.A kind of feature invariant based on the spatio-temporal affine transformation has aided the identification of similar trajectories to be beneficial for action recognition.Method A general method for constructing spatiotemporal dual affine differential invariant(STDADI)has been proposed.The rational polynomial of derivatives of joint trajectories to obtain the invariants has been utilized in detail via eliminating the transformation parameters effectively.Robust,coordinate-system-independent feature has calculated directly from the 3 D coordinates.Bounding the degree of polynomial and the order of derivatives,we generate 8 independent STDADIs and combine them as an invariant vector at each moment for each human joint.Moreover,an intuitive and effective method called channel augmentation has been proposed to extend input data with STDADI along the channel dimension for training and evaluation.Specifically,the coordinate vector and the STDADI vector at each joint for each frame have been concatenated.Channel augmentation has introduced invariant information into input data without changing the inner structure of neural networks.The spatio-temporal graph convolutional networks(ST-GCN)as the basic network have been used.The skeleton data modeling as a graph structure has envolved spatial and temporal connections between human joints simultaneously.Particularly,it has exploited local pattern and correlation from human skeletons.In other words,the importance of joints along the action sequence has been expressed as the weights of human joints in the spatio-temporal graph.This is in line with our STDADI,because both of them focus on describing joint dynamics,and our features further provide an invariant expression which is not affected by the distortions.Result The synthetic data has been examined to verify the effectiveness of STDADI as well as the large-scale action recognition dataset.First,3 D spiral line and selected joint trajectory based on NTU-RGB+D applied with random transformation parameters has shown that STDADI is invariant under the spatio-temporal affine transformations.Next,the effectiveness of the proposed feature and method has been validated on the large-scale action recognition dataset NTU(Nanyang Technological University)RGB+D(NTU 60)and its extended version NTU-RGB+D 120(NTU 120),which is currently the largest dataset with 3 D joint annotations captured in a constrained indoor environment,and perform some detailed study to examine the contributions of STDADI.A data augmentation technique as well as the original ST-GCN have been as the baseline methods.The data augmentation technique has involved rotation,scaling and shear transformations of 3 D skeletons.The same training strategy and hyper-parameters as the original ST-GCN have been used.ST-GCN+channel augmentation has performed well.Compared with the ST-GCN using raw data,in NTU 60,the cross-subject and cross-view recognition accuracy has been increased by 1.9%and 3.0%,respectively;in NTU 120,the cross-subject and cross-setup recognition accuracy has increased by 5.6%and 4.5%respectively.As it is mainly consisted of 3 D geometric transformations,the accuracy in cross-view recognition has been much improved but contributes little to the cross-subject setting for data augmentation.The spatio-temporal dual affine transformation assumption has been validated on both evaluation criteria.Conclusion A general method for constructing spatio-temporal dual affine differential invariant(STDADI)has been proposed.The effectiveness of this invariant feature using a channel augmentation technique has been proved on the largescale action recognition dataset NTU-RGB+D and NTU-RGB+D 120.The combination of hand-crafted features and datadriven methods has improved the accuracy and generalization well.

作者李琪墨瀚林赵婧涵郝宏翔李华 Li Qi;Mo Hanlin;Zhao Jinghan;Hao Hongxiang;Li Hua(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区中国科学院计算技术研究所智能信息处理重点实验室中国科学院大学

出处《中国图象图形学报》 CSCD 北大核心 2021年第12期2879-2891,共13页 Journal of Image and Graphics

基金国家重点研发计划资助(2019YFF0301801,2017YFB1002703) 国家重点基础研究规划项目(2015CB554507) 国家自然科学基金项目(61379082)。

关键词运动分析骨架动作识别时空双仿射变换微分不变量通道增强泛化能力 motion analysis skeleton-based action recognition spatio-temporal dual affine transformation differential invariant features channel augmentation generalization ability

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1丁重阳,刘凯,李光,闫林,陈博洋,钟育民.基于时空权重姿态运动特征的人体骨架行为识别研究[J].计算机学报,2020,43(1):29-40. 被引量：30
2王扬扬,李一波,姬晓飞.人体动作的超兴趣点特征表述及识别[J].中国图象图形学报,2013,18(7):805-812. 被引量：8
3郑潇,彭晓东,王嘉璇.基于姿态时空特征的人体行为识别方法[J].计算机辅助设计与图形学学报,2018,30(9):1615-1624. 被引量：14
4钟秋波,郑彩明,朴松昊.时空域融合的骨架动作识别与交互研究[J].智能系统学报,2020,15(3):601-608. 被引量：8

二级参考文献39

1Efros A A, Berg A C, Mori G, et al. Recognizing action at a dis- tance[ C ]//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE, 2003:726-733.
2Yilmaz A, Shah M. Matching actions in presence of camera mo- tion[ J]. Computer Vision and Image Understanding, 2006, 104(2-3) :221-231.
3Dollar P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features [ C ]//Proceedings of the 2nd Joint 1EEE International Workshop on :isual Surveillance and Perform- ance Evaluation of Tracking and Surveillance. Beijing, China: IEEE, 2005:65-72.
4Laptev L On space-time interest points [ J ]. International Journal of Computer Vision, 2005, 64 (2) : 107-123.
5Bregonzio M, Gong S, Xiang T. Recognising action as clouds of space-time interest points[ C ]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Florida, USA: IEEE, 2009 : 1948-1955.
6Niebles J C, Wang H, Li F F. Unsupervised learning of human action categories using spatial-temporal words [ J ]. International Journal of Computer Vision, 2008,79(3): 299-318.
7Laptev I, Marszalek M, Schmid C, et al. Learning realistic hu- man actions from movies [ C ]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA : IEEE, 2008 : 1-8.
8Klaser A, M Marszalek, C Schmid. A spatio-temporal descriptor based on 3D-gradients [ C ]//Proceedings of British Machine Vi- sion Conference. Leeds, England : University of Leeds, 2008 : 955-1004.
9Gao Z, Chen M, Hauptmann A, et at. Comparing evaluation protocols on the KTH dataset [ C ]//Proceedings of the 1 st Inter- national Conference on Human Behavior Understanding. Istan- bul, Turkey : Springer-Verlag, 2010:88-100.
10Nater F, Grabner H, Gool L Van. Exploiting simple hierarchies for unsupervised human behavior analysis [ C ]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, California, USA: IEEE, 2010 : 2014-2021.

共引文献55

1建中华,南静,刘鑫,代伟.基于时空张量融合的人体骨架行为自适应识别方法[J].仪器仪表学报,2023,44(6):74-85. 被引量：1
2刘淑琴,彭进业.约束优化进化的夜间图像时频复合加权提取[J].计算机科学,2014,41(6):295-298.
3应锐,蔡瑾,冯辉,杨涛,胡波.基于运动块及关键帧的人体动作识别[J].复旦学报（自然科学版）,2014,53(6):815-822. 被引量：6
4许丽娟,刘大龙.公交车危险动作视觉图像识别仿真[J].计算机仿真,2015,32(6):150-153. 被引量：4
5王江峰,陈天华.基于行为视觉的交通肇事现场检测模型仿真[J].计算机仿真,2015,32(8):195-198. 被引量：1
6陈小辉.PDV分解结合三元组的视角无关动作识别[J].控制工程,2015,22(5):1010-1016.
7罗会兰,冯宇杰,孔繁胜.融合多姿势估计特征的动作识别[J].中国图象图形学报,2015,20(11):1462-1472. 被引量：5
8杨建,刘述木,王晓林.投影深度向量分解融合PEMS的视角不变人体动作识别[J].计算机应用研究,2016,33(3):940-944. 被引量：1
9赵一秾.固定场景下的人体姿态识别[J].电脑编程技巧与维护,2018(11):150-152. 被引量：1
10王高宣.基于多特征融合与机器学习的篮球运动姿态识别[J].甘肃科学学报,2019,31(3):1-4. 被引量：6

同被引文献10

1刘俊菊,姜德烁.关于两曲面交线的不变量(英文)[J].数学杂志,2008,28(2):119-123. 被引量：2
2成丽美,袁伟,姚若侠.活动标架的构造及其在模式识别中的应用研究[J].计算机工程与应用,2013,49(19):147-152. 被引量：2
3于延华,刘玲,杨云.基于活动标架对Hilbert曲线的研究[J].东北大学学报（自然科学版）,2019,40(7):1061-1064. 被引量：2
4刘涛,侯才生,李银萍.利用Frenet活动标架构建涡旋压缩机型线的新方法[J].哈尔滨工业大学学报,2020,52(1):194-200. 被引量：3
5墨瀚林,郝优,李华.形状和颜色变换下图像的Gaussian-Hermite矩不变量[J].计算机辅助设计与图形学学报,2022,34(3):341-351. 被引量：3
6杨清山,穆太江.采用蒸馏训练的时空图卷积动作识别融合模型[J].中国图象图形学报,2022,27(4):1290-1301. 被引量：5
7姜权晏,吴小俊,徐天阳.用于骨架行为识别的多维特征嵌合注意力机制[J].中国图象图形学报,2022,27(8):2391-2403. 被引量：5
8郭锐,贾丽,郝宏翔,墨瀚林,李华.基于Gaussian-Hermite矩的旋转运动模糊不变量[J].中国图象图形学报,2022,27(8):2458-2472. 被引量：2
9陶树,王美丽.结合姿态估计和时序分段网络分析的羽毛球视频动作识别[J].中国图象图形学报,2022,27(11):3280-3291. 被引量：4
10XIEFu-Ding,GAOXiao-Shan.A Computational Approach to the New Type Solutions of Whitham-Broer-KaupEquation in Shallow Water[J].Communications in Theoretical Physics,2004,41(2):179-182. 被引量：2

引证文献3

1墨瀚林,郝优,郭锐,郝宏翔,张贺,李琪,李华.图形图像积分与微分不变量的构造与应用[J].图学学报,2022,43(6):1182-1192. 被引量：1
2雷桂英,宋军锋.广义的WBKL方程和HS-KdV方程的微分不变量、微分不变方程[J].长春师范大学学报,2024,43(4):1-9.
3张学琪,胡海洋,潘开来,李忠金.基于多视图自适应3D骨架网络的工业装箱动作识别[J].中国图象图形学报,2024,29(5):1392-1407.

二级引证文献1

1张凯蒙.基于NB-IoT的校园宿舍智能锁控制系统设计[J].自动化与仪器仪表,2024(2):127-130.

1苏步青.桃李满天下[J].中学生数理化（八年级物理）（人教版）,2010,0(3):1-1.
2杨博华,胡智宇,李成,冯世荣.某重卡差速器十字轴降磨耗改进[J].汽车实用技术,2021,46(21):101-103. 被引量：1
3沈娟霞,叶交友,陆鸣亮,罗来朋,辛江梅,刘元强.基于硼系防虫OSB硼化合物含量检测与分析[J].林业机械与木工设备,2022,50(1):51-54.
4苗启广,辛文天,刘如意,谢琨,王泉,杨宗凯.面向智慧教育行为分析的图卷积骨架动作识别方法[J].计算机科学,2022,49(2):156-161. 被引量：3
5赵萍,李欣,朱少武.基于时空图注意力神经网络的交通道路拥塞和异常预测[J].科学技术与工程,2022,22(3):1271-1278. 被引量：6
6秦琪晶,葛文庆,鲁应涛,李波,谭草.泵控伺服系统关键技术研究综述[J].机床与液压,2022,50(1):170-181. 被引量：6
7任义志,沈荣康,崔玉林.某机构作动筒接管嘴断裂分析与改进研究[J].机械工程师,2022(1):146-148. 被引量：1
8付丽敏,王丽真,闫璐.一类非线性抛物型方程的最优系统和对称破缺[J].纯粹数学与应用数学,2021,37(4):450-465.
9王泽,陈世友.基于可扩展外部知识的分词模型研究[J].舰船电子工程,2022,42(1):99-103.
10陈琼.经编织物中氨纶溶解新方法探讨[J].针织工业,2022(1):76-79.

中国图象图形学报

2021年第12期

浏览历史

内容加载中请稍等...

时空双仿射微分不变量及骨架动作识别被引量：3

参考文献4

二级参考文献39

共引文献55

同被引文献10

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

时空双仿射微分不变量及骨架动作识别 被引量：3

参考文献4

二级参考文献39

共引文献55

同被引文献10

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

时空双仿射微分不变量及骨架动作识别被引量：3