摘要
Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial details remains a persistent challenge.Moreover,casually captured input,involving both head poses and camera movements,introduces additional difficulties to existing methods of head avatar reconstruction.To address the challenge posed by video data captured with camera motion,we propose a novel method,AvatarWild,for reconstructing head avatars from monocular videos taken by consumer devices.Notably,our approach decouples the camera pose and head pose,allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints.To enhance the visual quality of the reconstructed facial avatar,we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency.Our method demonstrates superior performance compared to existing approaches,as evidenced by reconstruction and animation results on both multi-view and single-view datasets.Remarkably,our approach stands out by exclusively relying on video data captured by portable devices,such as smartphones.This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.
基金
supported by National Natural Science Foundation of China(No.6247075018 and No.62322210)
the Innovation Funding of ICT,CAS(No.E461020)
Beijing Munici-pal Natural Science Foundation for Distinguished Young Scholars(No.JQ21013)
Beijing Municipal Science and Technology Commission(No.Z231100005923031).