期刊文献+

基于结构化潜码引导NeRF的语音驱动人脸重演

Speech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes
下载PDF
导出
摘要 语音驱动的人脸重演的目标是生成与输入语音内容相匹配的高保真人脸面部动画.然而,由于音频与视频模态之间存在鸿沟,当前方法难以实现高质量的面部重演.针对现有方法保真度低、唇音同步效果差等问题,提出一种基于结构化潜码引导隐式神经表示的语音驱动人脸重演方法,以人脸点云序列作为中间表示,将语音驱动人脸重演分解为跨模态映射和神经辐射场渲染两大任务分别解决.首先,通过跨模态映射从音频预测人脸表情系数,利用人脸三维重建技术获得人脸身份系数;然后,基于3DMM模型合成人脸点云动画序列;接着,使用顶点位置信息构建结构化隐式神经表示,回归场景中每个采样点的密度和颜色值;最后,通过体绘制技术渲染人脸RGB帧,并装配到原图像中.在多个时长为3~5 min的单人演讲视频上的可视化比较、量化评估、主观评估等实验结果表明,文中所提方法在唇音同步效果与图像生成精度上优于AD-NeRF等方法,能够实现高保真语音驱动人脸重演. The goal of speech-driven facial reenactment aims to generate high-fidelity facial animation matching with the input speech content.However,existing methods can hardly achieve high-quality facial reenactment because of the gap between audio and video modals.In order to address the problems of existing methods such as low fidelity and poor lip sync effect,we propose a speech-driven facial reenactment method based on implicit neural representations with structured latent codes,which takes the point cloud sequence of human face as the intermediate representation,decomposing the speech-driven facial reenactment into two tasks:cross-modal mapping and neural radiance fields rendering.Firstly,we predict the facial expression coefficients through cross-modal mapping and get the facial identity coefficients by 3D face reconstruction;then,we synthesize face point cloud sequence based on 3DMM;next,we use the position of vertices constructing the structured implicit neural representations and regress density and color for each sampling points;finally,we render RGB frames of human face through volume rendering techniques and assemble them into original image.Experiments results on multiple 3—5 min individual speech videos,including visual comparison,quantitative evaluation,and subjective assessment demonstrate that our method achieves better results than state-of-the-art methods such as AD-NeRF in terms of lip-sync accuracy and image generation precision,which can achieve high-fidelity speech-driven facial reenactment.
作者 谢志峰 郑迦恒 王吉 梁佳佳 马利庄 Xie Zhifeng;Zheng Jiaheng;Wang Ji;Liang Jiajia;Ma Lizhuang(Shanghai Film Academy,Shanghai University,Shanghai 200072;Shanghai Engineering Research Center of Motion Picture Special Effects,Shanghai 200072;Department of Computer Science and Engineering,Shanghai Jiao Tong University,Shanghai 200240)
出处 《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2024年第10期1616-1624,共9页 Journal of Computer-Aided Design & Computer Graphics
基金 上海市自然科学基金(19ZR1419100) 上海市级科技重大专项(2021SHZDZX0102) 上海市科技创新行动计划人工智能科技支撑项目(21511101200) 上海市青年科技英才扬帆计划(22YF1420300).
关键词 音频驱动人脸重演 隐式神经表示 神经辐射场 跨模态 audio-driven facial reenactment implicit neural representations neural radiance field(NeRF) cross-modal
  • 相关文献

参考文献3

二级参考文献14

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部