单视角三维人体重建的着装特征学习

Clothed feature learning for single-view 3D human reconstruction

导出

摘要目的由于单视角着装人体重建中存在肢体遮挡、着装姿态复杂,且现有方法仅能精确提取和表示着装人体图像中的视觉特征,未考虑复杂的着装姿态引起的动态细节表达,较难生成具有动态褶皱的着装人体模型。因此,提出一种单视角三维人体重建的着装特征学习方法。方法首先对着装人体图像集中的单视角图像进行肢体特征表示,通过二维关节点预测与姿态特征深度回归,提取人体的着装姿态特征;再基于着装姿态特征,定义以柔性变形关节点为中心的着装褶皱采样空间和柔性变形损失函数,通过引入服装模板对输入的着装人体真值模型学习着装柔性变形,获得着装褶皱特征;然后,结合姿态参数回归、特征图采样特征和编解码器,构建联合着装人体像素和体素的人体形状特征学习模块,得到着装人体形状特征;最后结合褶皱特征、着装人体形状特征和计算的三维采样空间,通过定义有向距离场进行三维人体重建,输出最终的着装人体模型。结果为了验证方法的有效性,在公开的THuman2.0数据集进行对比实验。结果显示,构建姿态特征学习模块有助于重建完整的肢体与正确的姿态,以褶皱特征学习对形状特征进行优化,可以获得高精度的重建结果。与当前先进的单视角三维人体重建方法比较,相比于性能第2的模型,本文方法重建结果的点到面距离与倒角距离分别降低了4.4%和2.6%。结论本文提出的单视角三维人体重建的着装特征学习方法,能有效学习单视角三维人体重建的着装特征,生成具有复杂姿态和动态褶皱的着装人体模型。 Objective Clothed human reconstruction is an important problem in the field of computer vision and computer graphics.This process aims to generate three-dimensional(3D)human body models,including clothes and accessories,through computer technology,and is widely used in virtual reality,digital human body,3D clothing assistant design,film and television special effects production,and other scenes.Compared with a large number of single-view images available on the internet,multiview images are more difficult to obtain.Considering that single-view images are easier to obtain on the internet,which can greatly reduce the use conditions and hardware cost of reconstruction,we consider single-view images as input to establish a complete mapping between single-view human image and human shape and restore the 3D shape and geometric details of the human body.Most methods based on parametric models can only predict the shape and posture of the human body with a smooth surface,whereas nonparametric model methods lack a fixed grid topology when generating fine geometric shapes.High-precision 3D human model extraction can be realized through the combination of parametric human model and implicit function.Given that clothing can produce dynamic flexible deformation with the change in human posture,most methods focus on obtaining the fold details of a clothed human model from 3D mesh defor⁃mation.The clothing can be separated from the human body with the assistance of a clothed template,and flexible deforma⁃tion of the clothing caused by human body posture can be directly obtained via the learning-based method.Given the over⁃lapping of limbs,occlusion,and complex clothed posture of clothed human body in single-view 3D human body reconstruc⁃tion,obtaining geometric shape representation under various clothed postures and angles is difficult.Moreover,existing methods can only accurately extract and represent visual features from clothed human body images without considering the dynamic detail expression caused by complex clothed posture.Difficulties are encountered regarding the representation and learning of the clothed features related to the posture of single-view clothed human and generation of a clothed mesh with complex posture and dynamic folds.In this paper,we propose a single-perspective 3D human reconstruction clothed fea⁃ture learning method.Method We propose a feature learning approach to reconstruct clothed human with a single-view image.The experimental hardware platform in this paper used two NVIDIA GeForce RTX 1080Ti GPU.We utilized the clothing co-parsing fashion street photography dataset,which includes 2098 human images,to analyze the physical fea⁃tures of clothing.Human3.6M dataset was used to learn posture features of the human body,with the test set at 3DPW cap⁃ture from the field environment.For fold feature learning,we used objects 00096 and 00159 in the CAPE dataset.For bet⁃ter training effects of the clothed mesh,we selected 150 meshes close to the dressed posture from the THuman2.0 dataset as the training set for use in shape feature learning.First,we represented the limb features of the single-view image and extracted the clothed human pose features through 2D node prediction and deep regression of pose features.Then,based on the pose features of the clothing,the sampling space of clothing fold centered on the flexible deformation joint and flex⁃ible deformation loss function were defined.In addition,flexible clothing deformation was learned through the introduction of a clothing template to the input ground truth model of the clothing body to obtain fold features.We only focused on cru⁃cial details inside the space to acquire the fold features.Afterward,the human shape features learning module was con⁃structed via the combination of posture parameter regression,feature map sampling feature,and codec.The pixel and voxel alignment features were learned from the corresponding image and grid in the 3D human mesh dataset,and the shape features of the human body were decoded.Finally,through the combination of fold features,shape features of a clothed human,and calculated 3D sampling space,the 3D human mesh was reconstructed by defining the signed distance field,and the final clothed human model was outputted.Result Aiming at the results of posture feature and single-view 3D clothed human reconstruction,we used 3DPW and THuman2.0 datasets.Our experimental results findings compared with those of the three methods on the 3DPW dataset.The mean per joint position error(MPJPE)and MPJPE of Platts align⁃ment(PA-MPJPE)were used to evaluate the differences between the predicted 3D joints and ground truth.The mean per vertex position error(MPVPE)was used to evaluate the predicted SMPL 3D human shape and the ground truth grid.Com⁃pared with that of the second-best model,the error was decreased by 1.28 on MPJPE,1.66 on PA-MPJPE and 2.38 on MPVPE,and the average error was reduced by 2.4%.For the 3D reconstruction of the clothed human body,we conducted experiments to compare the four methods on the THuman2.0 dataset.We used the Chamfer distance(CD/Chamfer)and point-to-surface distance(P2S)of the 3D space to evaluate the gap between the two 3D mesh groups.Notably,the P2S of the reconstructed result can be reduced by 4.4%compared with the second-best model,and CD can be reduced by 2.6%.Experimental results reveal that the posture feature learning module contributed to the reconstruction of the complete limb and correct posture,and fold feature learning for optimized learned shape features can be used to obtain high-precision reconstruction results.Conclusion In this paper,the clothed feature learning method for single-view 3D human-body recon⁃struction enables the effective learning of the clothed feature of single-view 3D human reconstruction and generates clothed human reconstruction results with complex posture and dynamic folds.

作者黄千芃刘骊付晓东刘利军彭玮 Huang Qianpeng;Liu Li;Fu Xiaodong;Liu Lijun;Peng Wei(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China;Computer Technology Application Key Laboratory of Yunnan Province,Kunming 650500,China)

机构地区昆明理工大学信息工程与自动化学院云南省计算机技术应用重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2024年第9期2610-2624,共15页 Journal of Image and Graphics

基金国家自然科学基金项目(62262036,62362043,61962030) 云南省中青年学术和技术带头人后备人才培养计划项目(202005AC160036)。

关键词单视角三维人体重建着装特征学习采样空间柔性变形有向距离场 single-view 3D human reconstruction clothed feature learning sampling space flexible deformation signed distance field

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]