融合时序特征约束与联合优化的点云3维人体姿态序列估计被引量：2

3D human pose sequence estimation from point clouds combing temporal feature and joint learning strategy

导出

摘要目的3维人体姿态估计传统方法通常采用单帧点云作为输入,可能会忽略人体运动平滑度的固有先验知识,导致产生抖动伪影。目前,获取2维人体姿态标注的真实图像数据集相对容易,而采集大规模的具有高质量3维人体姿态标注的真实图像数据集进行完全监督训练有一定难度。对此,本文提出了一种新的点云序列3维人体姿态估计方法。方法首先从深度图像序列估计姿态相关点云,然后利用时序信息构建神经网络,对姿态相关点云序列的时空特征进行编码。选用弱监督深度学习,以利用大量的更容易获得的带2维人体姿态标注的数据集。最后采用多任务网络对人体姿态估计和人体运动预测进行联合训练,提高优化效果。结果在两个数据集上对本文算法进行评估。在ITOP(invariant-top view dataset)数据集上,本文方法的平均精度均值(mean average precision,mAP)比对比方法分别高0.99%、13.18%和17.96%。在NTU-RGBD数据集上,本文方法的mAP值比最先进的WSM(weakly supervised adversarial learning methods)方法高7.03%。同时,在ITOP数据集上对模型进行消融实验,验证了算法各个不同组成部分的有效性。与单任务模型训练相比,多任务网络联合进行人体姿态估计和运动预测的mAP可以提高2%以上。结论本文提出的点云序列3维人体姿态估计方法能充分利用人体运动连续性的先验知识,获得更平滑的人体姿态估计结果,在ITOP和NTU-RGBD数据集上都能获得很好的效果。采用多任务网络联合优化策略,人体姿态估计和运动预测两个任务联合优化求解,有互相促进的作用。 Objective Point cloud-based 3 D human pose estimation is one of the key aspects in computer vision.A wide range of its applications have been developing in augmented reality/virtual reality(AR/VR),human-computer interaction(HCI),motion retargeting,and virtual avatar manipulation.Current deep learning-based 3 D human pose estimation has been challenging on the following aspects:1)the 3 D human pose estimation task is constrained of the occlusion and self-occlusion ambiguity.Moreover,the noisy point clouds from depth cameras may cause difficulties to learn a proper human pose estimation model.2)Current depth-image based methods are mainly focused on single image-derived pose estimation,which may ignore the intrinsic priors of human motion smoothness and leads to jittery artifacts results on consistent point cloud sequences.The potential is to leverage point cloud sequences for high-fidelity human pose estimation via human motion smoothness enforcement.However,it is challenging to design an effective way to get human poses by modeling point cloud sequences.3)It is hard to collect large-scale real image dataset with high-quality 3 D human pose annotations for fully-supervised training,while it is easy to collect real dataset with 2 D human pose annotations.Moreover,human pose estimation is closely related to motion prediction,which aims to predict the future motion available.The challenging issue is whether 3 D human poses estimation and motion prediction can realize mutual benefit.Method We develop a method to obtain high fidelity 3 D human pose from point cloud sequence.The weakly-supervised deep learning architecture is used to learn 3 D human pose from 3 D point cloud sequences.We design a dual-level human pose estimation pipeline using point cloud sequences as input.1)The 2 D pose information is estimated from the depth maps,so that the background is removed and the pose-aware point clouds are extracted.To ensure that the normalized sequential point clouds are in the same scale,the point clouds normalization is carried out based on a fixed bounding box for all the point clouds.2)Pose encoding has been implemented via hierarchical PointNet++backbone and long short-term memory(LSTM)layers based on the spatial-temporal features of pose-aware point cloud sequences.To improve the optimization effect,a multi-task network is employed to jointly resolve human pose estimation and motion prediction problem.In order to use more training data with 2 D human pose annotations and release the ambiguity by the supervision of 2 D joints,weakly-supervised learning is adopted in our framework.Result In order to validate the performance of the proposed algorithm,several experiments are conducted on two public datasets,including invariant-top view dataset(ITOP)and NTU-RGBD dataset.The performance of our methods is compared to some popular methods including V2 VPoseNet,viewpoint invariant method(VI),Inference Embedded method and the weakly supervised adversarial learning methods(WSM).For the ITOP dataset,our mean average precision(mAP)value is 0.99%point higher than that of WSM given the threshold of 10 cm.Compared with VI and Inference Embedded method,each mAP value is 13.18%and 17.96%higher.Each of mean joint errors is 3.33 cm,5.17 cm,1.67 cm and 0.67 cm,which is lower than the VI method,Inference Embedded method,V2 V-PoseNet and WSM,respectively.The performance gain could be originated from the sequential input data and the constraints from the motion parameters like velocity and the accelerated velocity.1)The sequential data is encoded through the LSTM units,which could get the smoother prediction and improve the estimation performance.2)The motion parameters can alleviate the jitters caused by random sampling and yield the direct supervision of the joint coordinates.For the NTU-RGBD dataset,we compare our method with current WSM.The mAP value of our method is 7.03 percentage points higher than that with WSM if the threshold is set to 10 cm.At the same time,ablation experiments are carried out on the ITOP dataset to investigate the effect of multiple components.To understand the effect of the input sequential point clouds,we design experiment with different temporal receptive field of the sequential point clouds.The receptive field is set to 1 for the estimated results of the sequential data excluded.The percentage of correct keypoints(PCK)result drops to the lowest value of 88.57%when the receptive field is set to 1,the PCK values can be increased as the receptive field increases from 1 to 5,and the PCK value becomes more steadily when the receptive field is greater than 13.Our PCK value is 87.55%trained only with fully labeled data and the PCK value of the model trained with fully and weakly labeled data is 90.58%.It shows that our weakly supervised learning methods can improve the performance of our model by 2 point percentage.And,the experiments demonstrate that our weakly supervised learning method can be used for a small amount of fully labeled data as well.Compared with model trained for single task,the mAP of human pose estimation and motion prediction based on multi task network can be improved by more than 2 percentage points.Conclusion To obtain smoother human pose estimation results,our method can make full use of the prior of human motion continuity.All experiments demonstrate that our contributed components are all effective,and our method can achieve the state-of-the-art performance efficiently on ITOP dataset and NTU-RGBD dataset.The joint training strategy is valid for the mutual tasks of human pose estimation and motion prediction.With the weakly supervised method on sequential data,it can use more easy-to-access training data and our model is robust over different levels of training data annotations.It could be applied to such of scenarios,which require high-quality human poses like motion retargeting and virtual fitting.Our method has its related potentials of using sequential data as input.

作者廖联军钟重阳张智恒胡磊张子豪夏时洪 Liao Lianjun;Zhong Chongyang;Zhang Zhiheng;Hu Lei;Zhang Zihao;Xia Shihong(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China;School of Information Science and Technology,North China University of Technology,Beijing 100144,China)

机构地区中国科学院计算技术研究所中国科学院大学北方工业大学信息学院

出处《中国图象图形学报》 CSCD 北大核心 2022年第12期3608-3621,共14页 Journal of Image and Graphics

基金国家重点研发计划资助(2020YFF0304701) 国家自然科学基金项目(61772499) 北京市自然科学基金项目(L182052)。

关键词人体运动人体姿态估计人体运动预测点云序列弱监督学习 human motion human pose estimation human motion prediction point cloud sequence weakly-supervised learning

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

同被引文献4

1李梦瑶,周亚同,韦创,李民.无人货架场景下的人体关节点定位算法研究[J].计算机工程与科学,2021,43(3):494-502. 被引量：1
2黄展原,李兵,李庚浩.基于视频和人体姿态估计的老年人摔倒监测研究[J].计算机工程与科学,2021,43(5):883-890. 被引量：7
3李书杰,朱海生,王磊,刘晓平.面向人体骨骼运动数据优化的双自编码器网络[J].中国图象图形学报,2022,27(4):1277-1289. 被引量：1
4陶树,王美丽.结合姿态估计和时序分段网络分析的羽毛球视频动作识别[J].中国图象图形学报,2022,27(11):3280-3291. 被引量：5

引证文献2

1赵金源,贾迪.改进YOLOv5的多人姿态估计修正算法[J].计算机工程与科学,2024,46(5):852-860.
2薛峰,边福利,李书杰.面向三维人体坐标及旋转角估计的注意力融合网络[J].中国图象图形学报,2024,29(10):3116-3129.

1申琼鑫,杨涛,徐胜.基于RGB-D图像弱监督学习的3D人体姿态估计[J].传感器与微系统,2022,41(1):69-71. 被引量：2
2杨前,刘兴科,罗建桥,刘雪垠,李柏林.基于多任务上下文增强的花椒检测模型[J].机械制造与自动化,2023,52(1):113-118. 被引量：2
3陈茂源.基于CMDB的IT固定资产管理系统研究与实现[J].中国石油企业,2022(7):94-97. 被引量：2
4林鸿辉,刘建华,郑智雄,胡任远,罗逸轩.联合对话行为识别与情感分类的多任务网络[J].计算机工程与应用,2023,59(3):104-111. 被引量：1
5熊彪,易愿.军事供应链网络联合效能优化模型构建与仿真[J].装甲兵学报,2022(5):58-64.
6张婧媛,王宏霞,何沛松.基于Transformer的多任务图像拼接篡改检测算法[J].计算机科学,2023,50(1):114-122. 被引量：2
7张宝峰,田宇,朱均超,刘娜.污染场地修复环境中人体姿态估计算法研究[J].计算机应用与软件,2023,40(2):212-216.
8陈庆旭.远程通信电能信息采集系统及嵌入式技术的应用研究[J].通信电源技术,2022,39(23):161-163.
9刘旭亮,娄革伟,周炎.基于多传感器数据融合的航天发射车自动转运对接系统设计[J].智能物联技术,2022,54(4):23-29. 被引量：1
10李万益,区济初,黄靖敏,邝芸,邹领,黄晓洁,许伟辉,廖理想.基于姿态编码的数字逻辑电路设计[J].现代计算机,2022,28(23):18-24.

中国图象图形学报

2022年第12期

浏览历史

内容加载中请稍等...

融合时序特征约束与联合优化的点云3维人体姿态序列估计被引量：2

同被引文献4

引证文献2

相关作者

相关机构

相关主题

浏览历史

融合时序特征约束与联合优化的点云3维人体姿态序列估计 被引量：2

同被引文献4

引证文献2

相关作者

相关机构

相关主题

浏览历史

融合时序特征约束与联合优化的点云3维人体姿态序列估计被引量：2