Driving facial animation based on tens of tracked markers is a challenging task due to the complex topology and to the non-rigid nature of human faces. We propose a solution named manifold Bayesian regression. First a...Driving facial animation based on tens of tracked markers is a challenging task due to the complex topology and to the non-rigid nature of human faces. We propose a solution named manifold Bayesian regression. First a novel distance metric, the geodesic manifold distance, is introduced to replace the Euclidean distance. The problem of facial animation can be formulated as a sparse warping kernels regression problem, in which the geodesic manifold distance is used for modelling the topology and discontinuities of the face models. The geodesic manifold distance can be adopted in traditional regression methods, e.g. radial basis functions without much tuning. We put facial animation into the framework of Bayesian regression. Bayesian approaches provide an elegant way of dealing with noise and uncertainty. After the covariance matrix is properly modulated, Hybrid Monte Carlo is used to approximate the integration of probabilities and get deformation results. The experimental results showed that our algorithm can robustly produce facial animation with large motions and complex face models.展开更多
Based upon motion capture,a semi-automatic technique for fast facial animation was implemented. While capturing the facial expressions from a performer,a camera was used to record her /his front face as a texture map....Based upon motion capture,a semi-automatic technique for fast facial animation was implemented. While capturing the facial expressions from a performer,a camera was used to record her /his front face as a texture map. The radial basis function( RBF) technique was utilized to deform a generic facial model and the texture was remapped to generate a personalized face.Partitioning the personalized face into three regions and using the captured facial expression data,the RBF and Laplacian operator,and mean-value coordinates were implemented to deform each region respectively. With shape blending,the three regions were combined together to construct the final face model. Our results show that the technique is efficient in generating realistic facial animation.展开更多
In this paper,we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip.By combining spectral-dimensional bidirectional long short-term memory and temporal attent...In this paper,we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip.By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism,we design a light-weight speech encoder that leams useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data.To learn subject-independent facial motion,we use deformation gradients as the internal representation,which allows nuanced local motions to be better synthesized than using vertex offsets.Compared with state-of-the-art automatic-speech-recognition-based methods,our model is much smaller but achieves similar robustness and quality most of the time,and noticeably better results in certain challenging cases.展开更多
In this paper, a facial animation system is proposed for capturing bothgeometrical information and illumination changes of surface details, called expression details, fromvideo clips simultaneously, and the captured d...In this paper, a facial animation system is proposed for capturing bothgeometrical information and illumination changes of surface details, called expression details, fromvideo clips simultaneously, and the captured data can be widely applied to different 2D face imagesand 3D face models. While tracking the geometric data, we record the expression details by ratioimages. For 2D facial animation synthesis, these ratio images are used to generate dynamic textures.Because a ratio image is obtained via dividing colors of an expressive face by those of a neutralface, pixels with ratio value smaller than one are where a wrinkle or crease appears. Therefore, thegradients of the ratio value at each pixel in ratio images are regarded as changes of a facesurface, and original normals on the surface can be adjusted according to these gradients. Based onthis idea, we can convert the ratio images into a sequence of normal maps and then apply them toanimated 3D model rendering. With the expression detail mapping, the resulted facial animations aremore life-like and more expressive.展开更多
To synthesize real-time and realistic facial animation, we present an effective algorithm which combines image- and geometry-based methods for facial animation simulation. Considering the numerous motion units in the ...To synthesize real-time and realistic facial animation, we present an effective algorithm which combines image- and geometry-based methods for facial animation simulation. Considering the numerous motion units in the expression coding system, we present a novel simplified motion unit based on the basic facial expression, and construct the corresponding basic action for a head model. As image features are difficult to obtain using the performance driven method, we develop an automatic image feature recognition method based on statistical learning, and an expression image semi-automatic labeling method with rotation invariant face detection, which can improve the accuracy and efficiency of expression feature identification and training. After facial animation redirection, each basic action weight needs to be computed and mapped automatically. We apply the blend shape method to construct and train the corresponding expression database according to each basic action, and adopt the least squares method to compute the corresponding control parameters for facial animation. Moreover, there is a pre-integration of diffuse light distribution and specular light distribution based on the physical method, to improve the plausibility and efficiency of facial rendering. Our work provides a simplification of the facial motion unit, an optimization of the statistical training process and recognition process for facial animation, solves the expression parameters, and simulates the subsurface scattering effect in real time. Experimental results indicate that our method is effective and efficient, and suitable for computer animation and interactive applications.展开更多
Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial d...Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial details remains a persistent challenge.Moreover,casually captured input,involving both head poses and camera movements,introduces additional difficulties to existing methods of head avatar reconstruction.To address the challenge posed by video data captured with camera motion,we propose a novel method,AvatarWild,for reconstructing head avatars from monocular videos taken by consumer devices.Notably,our approach decouples the camera pose and head pose,allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints.To enhance the visual quality of the reconstructed facial avatar,we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency.Our method demonstrates superior performance compared to existing approaches,as evidenced by reconstruction and animation results on both multi-view and single-view datasets.Remarkably,our approach stands out by exclusively relying on video data captured by portable devices,such as smartphones.This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.展开更多
基金Project supported by the National Natural Science Foundation of China (No. 60272031), the National Basic Research Program (973) of China (No. 2002CB312101) and the Technology Plan Program of Zhejiang Province (No. 2003C21010), China
文摘Driving facial animation based on tens of tracked markers is a challenging task due to the complex topology and to the non-rigid nature of human faces. We propose a solution named manifold Bayesian regression. First a novel distance metric, the geodesic manifold distance, is introduced to replace the Euclidean distance. The problem of facial animation can be formulated as a sparse warping kernels regression problem, in which the geodesic manifold distance is used for modelling the topology and discontinuities of the face models. The geodesic manifold distance can be adopted in traditional regression methods, e.g. radial basis functions without much tuning. We put facial animation into the framework of Bayesian regression. Bayesian approaches provide an elegant way of dealing with noise and uncertainty. After the covariance matrix is properly modulated, Hybrid Monte Carlo is used to approximate the integration of probabilities and get deformation results. The experimental results showed that our algorithm can robustly produce facial animation with large motions and complex face models.
基金Youth Foundation of Higher Education Scientific Research of Hebei Province,China(No.2010228)Foundation for Returned Overseas Scholars of Hebei Province,China(No.C2013003015)
文摘Based upon motion capture,a semi-automatic technique for fast facial animation was implemented. While capturing the facial expressions from a performer,a camera was used to record her /his front face as a texture map. The radial basis function( RBF) technique was utilized to deform a generic facial model and the texture was remapped to generate a personalized face.Partitioning the personalized face into three regions and using the captured facial expression data,the RBF and Laplacian operator,and mean-value coordinates were implemented to deform each region respectively. With shape blending,the three regions were combined together to construct the final face model. Our results show that the technique is efficient in generating realistic facial animation.
文摘In this paper,we present an efficient algorithm that generates lip-synchronized facial animation from a given vocal audio clip.By combining spectral-dimensional bidirectional long short-term memory and temporal attention mechanism,we design a light-weight speech encoder that leams useful and robust vocal features from the input audio without resorting to pre-trained speech recognition modules or large training data.To learn subject-independent facial motion,we use deformation gradients as the internal representation,which allows nuanced local motions to be better synthesized than using vertex offsets.Compared with state-of-the-art automatic-speech-recognition-based methods,our model is much smaller but achieves similar robustness and quality most of the time,and noticeably better results in certain challenging cases.
文摘In this paper, a facial animation system is proposed for capturing bothgeometrical information and illumination changes of surface details, called expression details, fromvideo clips simultaneously, and the captured data can be widely applied to different 2D face imagesand 3D face models. While tracking the geometric data, we record the expression details by ratioimages. For 2D facial animation synthesis, these ratio images are used to generate dynamic textures.Because a ratio image is obtained via dividing colors of an expressive face by those of a neutralface, pixels with ratio value smaller than one are where a wrinkle or crease appears. Therefore, thegradients of the ratio value at each pixel in ratio images are regarded as changes of a facesurface, and original normals on the surface can be adjusted according to these gradients. Based onthis idea, we can convert the ratio images into a sequence of normal maps and then apply them toanimated 3D model rendering. With the expression detail mapping, the resulted facial animations aremore life-like and more expressive.
基金supported by the 2013 Annual Beijing Technological and Cultural Fusion for Demonstrated Base Construction and Industrial Nurture (No. Z131100000113007)the National Natural Science Foundation of China (Nos. 61202324, 61271431, and 61271430)
文摘To synthesize real-time and realistic facial animation, we present an effective algorithm which combines image- and geometry-based methods for facial animation simulation. Considering the numerous motion units in the expression coding system, we present a novel simplified motion unit based on the basic facial expression, and construct the corresponding basic action for a head model. As image features are difficult to obtain using the performance driven method, we develop an automatic image feature recognition method based on statistical learning, and an expression image semi-automatic labeling method with rotation invariant face detection, which can improve the accuracy and efficiency of expression feature identification and training. After facial animation redirection, each basic action weight needs to be computed and mapped automatically. We apply the blend shape method to construct and train the corresponding expression database according to each basic action, and adopt the least squares method to compute the corresponding control parameters for facial animation. Moreover, there is a pre-integration of diffuse light distribution and specular light distribution based on the physical method, to improve the plausibility and efficiency of facial rendering. Our work provides a simplification of the facial motion unit, an optimization of the statistical training process and recognition process for facial animation, solves the expression parameters, and simulates the subsurface scattering effect in real time. Experimental results indicate that our method is effective and efficient, and suitable for computer animation and interactive applications.
基金supported by National Natural Science Foundation of China(No.6247075018 and No.62322210)the Innovation Funding of ICT,CAS(No.E461020)+1 种基金Beijing Munici-pal Natural Science Foundation for Distinguished Young Scholars(No.JQ21013)Beijing Municipal Science and Technology Commission(No.Z231100005923031).
文摘Recent advancements in the field have resulted in significant progress in achieving realistic head reconstruction and manipulation using neural radiance fields(NeRF).Despite these advances,capturing intricate facial details remains a persistent challenge.Moreover,casually captured input,involving both head poses and camera movements,introduces additional difficulties to existing methods of head avatar reconstruction.To address the challenge posed by video data captured with camera motion,we propose a novel method,AvatarWild,for reconstructing head avatars from monocular videos taken by consumer devices.Notably,our approach decouples the camera pose and head pose,allowing reconstructed avatars to be visualized with different poses and expressions from novel viewpoints.To enhance the visual quality of the reconstructed facial avatar,we introduce a view-dependent detail enhancement module designed to augment local facial details without compromising viewpoint consistency.Our method demonstrates superior performance compared to existing approaches,as evidenced by reconstruction and animation results on both multi-view and single-view datasets.Remarkably,our approach stands out by exclusively relying on video data captured by portable devices,such as smartphones.This not only underscores the practicality of our method but also extends its applicability to real-world scenarios where accessibility and ease of data capture are crucial.