期刊文献+

数字说话人脸生成技术综述

Survey of audio-driven talking face generation technology
下载PDF
导出
摘要 在现代计算机视觉和自然语言处理的交叉领域,数字说话人脸生成技术已经成为一个越来越重要的研究主题。数字说话人脸生成技术专注于依据预定的文本或音频序列生成逼真的人脸图像。近年来,深度学习方法,如卷积神经网络、生成对抗性网络以及神经渲染场在此领域已经表现出了显著的应用价值。这些方法不仅引起了学术界的广泛关注,而且在工业界得以实际应用,用于解决图像处理和计算机视觉方面的具体问题。尽管已经取得了一定的进展,实际应用这些方法仍然面临诸多挑战。综合分析和评估深度学习方法在数字说话人脸生成方面的具体实现,以识别现存方法的优缺点,探讨尚待解决的普遍问题,并突出仍需进一步研究的开放性问题。此外,从统计学角度列出了目前可用的数据集,并对其进行评估和比较,以便研究人员能更容易地选择满足他们需求的数据集。 In the interdisciplinary field of modern computer vision and natural language processing,digital talking facial generation technology has become an increasingly important research topic.Digital facial generation technology focuses on generating realistic facial images based on predetermined text or audio sequences.In recent years,deep learning methods such as convolutional neural networks,generative adversarial networks,and neural rendering fields have been used for digital talking face generation,which shows significant research and application value.These methods have not only attract widespread attention from the academic community,but also have been applied in industry to solve specific problems in image processing and computer vision.Although some progress has been made,the practical application of these technologies still faces many challenges.Comprehensively review and evaluate the specific implementation of deep learning methods in the generation of digital talking face to identify the advantages and disadvantages of existing methods,explore common problems that need to be solved,and highlight open issues that still require further research.In addition,currently available datasets from a statistical perspective were listed,evaluated and compared so that researchers can more easily choose datasets that meet their needs.
作者 张冰源 张旭龙 王健宗 程宁 肖京 ZHANG Bingyuan;ZHANG Xulong;WANG Jianzong;CHENG Ning;XIAO Jing(Ping An Technology(Shenzhen)Co.,Ltd.,Shenzhen 518000,China;Institute of Advanced Technology,University of Science and Technology of China,Hefei 230000,China)
出处 《大数据》 2024年第5期74-95,共22页 Big Data Research
基金 广东省重点领域研发计划“新一代人工智能”重大专项(No.2021B0101400003)。
关键词 数字说话人脸生成 虚拟人 语音驱动 digital talking face generation virtual human audio-driven
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部