数字说话人脸生成技术综述

Survey of audio-driven talking face generation technology

下载PDF

导出

摘要在现代计算机视觉和自然语言处理的交叉领域,数字说话人脸生成技术已经成为一个越来越重要的研究主题。数字说话人脸生成技术专注于依据预定的文本或音频序列生成逼真的人脸图像。近年来,深度学习方法,如卷积神经网络、生成对抗性网络以及神经渲染场在此领域已经表现出了显著的应用价值。这些方法不仅引起了学术界的广泛关注,而且在工业界得以实际应用,用于解决图像处理和计算机视觉方面的具体问题。尽管已经取得了一定的进展,实际应用这些方法仍然面临诸多挑战。综合分析和评估深度学习方法在数字说话人脸生成方面的具体实现,以识别现存方法的优缺点,探讨尚待解决的普遍问题,并突出仍需进一步研究的开放性问题。此外,从统计学角度列出了目前可用的数据集,并对其进行评估和比较,以便研究人员能更容易地选择满足他们需求的数据集。 In the interdisciplinary field of modern computer vision and natural language processing,digital talking facial generation technology has become an increasingly important research topic.Digital facial generation technology focuses on generating realistic facial images based on predetermined text or audio sequences.In recent years,deep learning methods such as convolutional neural networks,generative adversarial networks,and neural rendering fields have been used for digital talking face generation,which shows significant research and application value.These methods have not only attract widespread attention from the academic community,but also have been applied in industry to solve specific problems in image processing and computer vision.Although some progress has been made,the practical application of these technologies still faces many challenges.Comprehensively review and evaluate the specific implementation of deep learning methods in the generation of digital talking face to identify the advantages and disadvantages of existing methods,explore common problems that need to be solved,and highlight open issues that still require further research.In addition,currently available datasets from a statistical perspective were listed,evaluated and compared so that researchers can more easily choose datasets that meet their needs.

作者张冰源张旭龙王健宗程宁肖京 ZHANG Bingyuan;ZHANG Xulong;WANG Jianzong;CHENG Ning;XIAO Jing(Ping An Technology(Shenzhen)Co.,Ltd.,Shenzhen 518000,China;Institute of Advanced Technology,University of Science and Technology of China,Hefei 230000,China)

机构地区平安科技(深圳)有限公司中国科学技术大学先进技术研究院

出处《大数据》 2024年第5期74-95,共22页 Big Data Research

基金广东省重点领域研发计划“新一代人工智能”重大专项(No.2021B0101400003)。

关键词数字说话人脸生成虚拟人语音驱动 digital talking face generation virtual human audio-driven

分类号 TP37 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1数字说话[J].中国煤炭工业,2024(5):35-35.
2数字说话[J].中国煤炭工业,2024(7):41-41.
3数字说话[J].中国煤炭工业,2024(6):36-36.
4数字说话[J].中国煤炭工业,2024(8):29-29.
5数字说话[J].中国服饰,2024(6):6-6.
6数字说话[J].中国煤炭工业,2024(9):31-31.
7王磊,陈燕.生成式人工智能在高中教材“拓展应用”问题中的应用[J].生物学教学,2024,49(7):42-44.
8朱福惠.监察法研究中的数理实证方法——以26篇监察法实证研究论文为样本的分析[J].法学评论,2024,42(4):27-38.
9丁良栋.HPM视角下培养高中生数学问题提出能力的实践与研究[J].中学数学,2024(17):50-51.
10孔艳妮.新媒体时代我国科技期刊微信公众号运营现状及提升策略[J].学报编辑论丛,2023(1):413-417.

大数据

2024年第5期

浏览历史

内容加载中请稍等...

数字说话人脸生成技术综述

相关作者

相关机构

相关主题

浏览历史