Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition sys...Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.展开更多
3D morphable models(3DMMs)are generative models for face shape and appearance.Recent works impose face recognition constraints on 3DMM shape parameters so that the face shapes of the same person remain consistent.Howe...3D morphable models(3DMMs)are generative models for face shape and appearance.Recent works impose face recognition constraints on 3DMM shape parameters so that the face shapes of the same person remain consistent.However,the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution.In contrast,the identity embeddings meet the hypersphere distribution,and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.In other words,recognition loss and reconstruction loss can not decrease jointly due to their conflict distribution.To address this issue,we propose the Sphere Face Model(SFM),a novel 3DMM for monocular face reconstruction,preserving both shape fidelity and identity consistency.The core of our SFM is the basis matrix which can be used to reconstruct 3D face shapes,and the basic matrix is learned by adopting a twostage training approach where 3D and 2D training data are used in the first and second stages,respectively.We design a novel loss to resolve the distribution mismatch,enforcing that the shape parameters have the hyperspherical distribution.Our model accepts 2D and 3D data for constructing the sphere face models.Extensive experiments show that SFM has high representation ability and clustering performance in its shape parameter space.Moreover,it produces highfidelity face shapes consistently in challenging conditions in monocular face reconstruction.The code will be released at https://github.com/a686432/SIR.展开更多
Background The accurate(quantitative)analysis of 3D face deformation is a problem of increasing interest in many applications.In particular,defining a 3D model of the face deformation into a 2D target image to capture...Background The accurate(quantitative)analysis of 3D face deformation is a problem of increasing interest in many applications.In particular,defining a 3D model of the face deformation into a 2D target image to capture local and asymmetric deformations remains a challenge in existing literature.A measure of such local deformations may be a relevant index for monitoring the rehabilitation exercises of patients suffering from Par-kinson’s or Alzheimer’s disease or those recovering from a stroke.Methods In this paper,a complete framework that allows the construction of a 3D morphable shape model(3DMM)of the face is presented for fitting to a target RGB image.The model has the specific characteristic of being based on localized components of deformation.The fitting transformation is performed from 3D to 2D and guided by the correspondence between landmarks detected in the target image and those manually annotated on the average 3DMM.The fitting also has the distinction of being performed in two steps to disentangle face deformations related to the identity of the target subject from those induced by facial actions.Results The method was experimentally validated using the MICC-3D dataset,which includes 11 subjects.Each subject was imaged in one neutral pose and while performing 18 facial actions that deform the face in localized and asymmetric ways.For each acquisition,3DMM was fit to an RGB frame whereby,from the apex facial action and the neutral frame,the extent of the deformation was computed.The results indicate that the proposed approach can accurately capture face deformation,even localized and asymmetric deformations.Conclusion The proposed framework demonstrated that it is possible to measure deformations of a reconstructed 3D face model to monitor facial actions performed in response to a set of targets.Interestingly,these results were obtained using only RGB targets,without the need for 3D scans captured with costly devices.This paves the way for the use of the proposed tool in remote medical rehabilitation monitoring.展开更多
One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity informati...One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.展开更多
基金Supported by the Centre for Digital Entertainment at Bournemouth University by the UK Engineering and Physical Sciences Research Council(EPSRC)EP/L016540/1 and Humain Ltd.
文摘Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L_(1) and L_(2) norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.
基金supported in part by National Natural Science Foundation of China(61972342,61832016)Science and Technology Department of Zhejiang Province(2018C01080)+2 种基金Zhejiang Province Public Welfare Technology Application Research(LGG22F020009)Key Laboratory of Film and TV Media Technology of Zhejiang Province(2020E10015)Teaching Reform Project of Communication University of Zhejiang(jgxm202131).
文摘3D morphable models(3DMMs)are generative models for face shape and appearance.Recent works impose face recognition constraints on 3DMM shape parameters so that the face shapes of the same person remain consistent.However,the shape parameters of traditional 3DMMs satisfy the multivariate Gaussian distribution.In contrast,the identity embeddings meet the hypersphere distribution,and this conflict makes it challenging for face reconstruction models to preserve the faithfulness and the shape consistency simultaneously.In other words,recognition loss and reconstruction loss can not decrease jointly due to their conflict distribution.To address this issue,we propose the Sphere Face Model(SFM),a novel 3DMM for monocular face reconstruction,preserving both shape fidelity and identity consistency.The core of our SFM is the basis matrix which can be used to reconstruct 3D face shapes,and the basic matrix is learned by adopting a twostage training approach where 3D and 2D training data are used in the first and second stages,respectively.We design a novel loss to resolve the distribution mismatch,enforcing that the shape parameters have the hyperspherical distribution.Our model accepts 2D and 3D data for constructing the sphere face models.Extensive experiments show that SFM has high representation ability and clustering performance in its shape parameter space.Moreover,it produces highfidelity face shapes consistently in challenging conditions in monocular face reconstruction.The code will be released at https://github.com/a686432/SIR.
文摘Background The accurate(quantitative)analysis of 3D face deformation is a problem of increasing interest in many applications.In particular,defining a 3D model of the face deformation into a 2D target image to capture local and asymmetric deformations remains a challenge in existing literature.A measure of such local deformations may be a relevant index for monitoring the rehabilitation exercises of patients suffering from Par-kinson’s or Alzheimer’s disease or those recovering from a stroke.Methods In this paper,a complete framework that allows the construction of a 3D morphable shape model(3DMM)of the face is presented for fitting to a target RGB image.The model has the specific characteristic of being based on localized components of deformation.The fitting transformation is performed from 3D to 2D and guided by the correspondence between landmarks detected in the target image and those manually annotated on the average 3DMM.The fitting also has the distinction of being performed in two steps to disentangle face deformations related to the identity of the target subject from those induced by facial actions.Results The method was experimentally validated using the MICC-3D dataset,which includes 11 subjects.Each subject was imaged in one neutral pose and while performing 18 facial actions that deform the face in localized and asymmetric ways.For each acquisition,3DMM was fit to an RGB frame whereby,from the apex facial action and the neutral frame,the extent of the deformation was computed.The results indicate that the proposed approach can accurately capture face deformation,even localized and asymmetric deformations.Conclusion The proposed framework demonstrated that it is possible to measure deformations of a reconstructed 3D face model to monitor facial actions performed in response to a set of targets.Interestingly,these results were obtained using only RGB targets,without the need for 3D scans captured with costly devices.This paves the way for the use of the proposed tool in remote medical rehabilitation monitoring.
基金supported in part by the Beijing Municipal Natural Science Foundation,China(No.4222054)in part by the National Natural Science Foundation of China(Nos.62276263 and 62076240)the Youth Innovation Promotion Association CAS,China(No.Y2023143).
文摘One-shot face reenactment is a challenging task due to the identity mismatch between source and driving faces.Most existing methods fail to completely eliminate the interference of driving subjects’identity information,which may lead to face shape distortion and undermine the realism of reenactment results.To solve this problem,in this paper,we propose using a 3D morphable model(3DMM)for explicit facial semantic decomposition and identity disentanglement.Instead of using 3D coefficients alone for reenactment control,we take advantage of the generative ability of 3DMM to render textured face proxies.These proxies contain abundant yet compact geometric and semantic information of human faces,which enables us to compute the face motion field between source and driving images by estimating the dense correspondence.In this way,we can approximate reenactment results by warping source images according to the motion field,and a generative adversarial network(GAN)is adopted to further improve the visual quality of warping results.Extensive experiments on various datasets demonstrate the advantages of the proposed method over existing state-of-the-art benchmarks in both identity preservation and reenactment fulfillment.