融合图卷积与Transformer的三维人体姿态估计网络

3D Human Pose Estimation Network Combining Graph Convolution and Transformer

下载PDF

导出

摘要两阶段的3D人体姿态估计方法因先进的2D姿态检测器而取得了显著进步,但深度信息的歧义性仍使这项任务极具挑战性。为解决此难题,提出了MGCNTrans的3D人体姿态估计网络。该方法采用2D-3D的提升策略。MGCNTrans网络融合了Transformer网络和空间通道图卷积网络的优势。该模型以多帧数据为输入,利用周围帧的信息来约束当前帧的姿态估计。在特征学习方面,图卷积网络被用于学习人体关节之间的物理连接关系,捕捉局部的空间特征。而Transformer网络则挖掘关节之间的隐式关系,提供全局的上下文信息。为提升模型性能,图卷积层融合了空间层和通道层,促使节点在局部和全局范围内更好地进行交互,增加特征多样性,并更准确地估计人体姿态。结果表明,MGCNTrans网络在3D人体姿态估计任务上取得了优越性能,证明了其有效性和先进性。 The two-stage 3D human pose estimation method has made significant progress due to advanced 2D pose detectors,but the ambiguity of depth information still makes this task extremely challenging.To solve this problem,a 3D human pose estimation network based on MGCNTrans is proposed.This method adopts a 2D-3D boosting strategy.The MGCNTrans network combines the advantages of Transformer network and spatial channel graph convolutional network.This model takes multiple frames of data as input and utilizes information from surrounding frames to constrain the pose estimation of the current frame.In terms of feature learning,graph convolutional networks are used to learn the physical connections between human joints and capture local spatial features.The Transformer network mines the implicit relationships between joints and provides global contextual information.To improve model performance,the graph convolutional layer integrates spatial and channel layers,enabling better interaction between nodes at both local and global scales,increasing feature diversity,and more accurately estimating human pose.The results show that MGCNTrans network has achieved superior performance in 3D human posture estimation task,which proves its effectiveness and progressiveness.

作者闫永杰李敏奇 YAN Yongjie;LI Minqi(College of Electronic Information,Xi′an Polytechnic University,Xi′an,Shaanxi 710600,China)

机构地区西安工程大学电子信息学院

出处《自动化应用》 2024年第13期71-75,86,共6页 Automation Application

基金陕西省自然科学基金项目(2022JM-348) 陕西省复杂系统控制与智能信息处理重点实验室基金(SKL2020CP04)。

关键词三维人体姿态估计图卷积网络 Transformer网络 3D human pose estimation graph convolutional network Transformer network

分类号 TP391 [自动化与计算机技术—计算机应用技术] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

1田亚林,连增增,王鹏辉,王孟奇,陆力.基于KF-LSTM的UWB室内定位算法[J].测绘通报,2024(7):95-99.
2段勇,刘铁.基于语言和视觉融合Transformer的指代图像分割[J].传感技术学报,2024,37(7):1193-1201.
3黄亚丽.关联理论语境观视角下语言模糊性和歧义性之研究[J].海外英语,2024(14):68-70.
4李冲冲,史操.基于改进LSTM的电抗器故障预警方法[J].信息技术,2024,48(7):76-83.
5沈澍,张文昊,丁浩,张浩,沙超,王森,陈书军.三维步态识别研究进展[J].中国图象图形学报,2024,29(7):1921-1933.
6蔡毅,樊蓉,金沙.数据中心无损网络关键技术与组网策略研究[J].邮电设计技术,2024(7):83-87.
7钱屏匀.古典诗词“错综句”的多元阐释维度及其英译路径考略[J].外国语言与文化,2024,8(2):26-38.

自动化应用

2024年第13期

浏览历史

内容加载中请稍等...

融合图卷积与Transformer的三维人体姿态估计网络

相关作者

相关机构

相关主题

浏览历史