摘要
语音信息处理技术在深度学习的推动下发展迅速,其中语音合成和转换技术相结合能实现实时高保真的指定对象、内容的语音输出,在人机交互、泛娱乐等领域具有广泛的应用前景。文中旨在对基于深度学习的语音合成与转换技术进行综述。首先,简要回顾了语音合成和转换技术的发展历程;接着,列举了在语音合成、转换领域的常见公开数据集以便研究者开展相关探索;然后,讨论了从文本到语音模型,包括在风格、韵律、速度等方面进行改进的经典和前沿的模型、算法,并分别对比评述了其效果与发展潜力;进一步针对语音转换进行综述,归纳总结了转换方法与优化思路;最后,总结了语音合成与转换的应用与挑战,并根据其在模型、应用和规范方面所面临的问题,展望了未来在模型压缩、少样本学习和伪造检测方面的发展方向。
Voice information processing technology is developing rapidly under the impetus of deep learning.The combination of speech synthesis and voice conversion technology can achieve real-time high-fidelity voice output of designated objects and content,and has broad application prospects in man-machine interaction,pan-entertainment and other fields.This paper aims to provide an overview of speech synthesis and voice conversion technology based on deep learning.First,this paper briefly reviews the development of speech synthesis and voice conversion technology.Next,it enumerates the common public datasets in these fields so that it is convenient for researchers to carry out related explorations.Then,it discusses the TTS models,including the classic and cutting-edge models and algorithms in terms of style,rhythm,speed,and compares their effects and development potentials respectively.Then,it reviews voice conversion by summarizing the voice conversion methods and optimization methods.Finally,it summarizes the applications and challenges of speech synthesis and voice conversion,and looks forward to their future development direction in model compression,few-shot learning and forgery detection,based on the problems faced by them in terms of model,application and regulation.
作者
潘孝勤
芦天亮
杜彦辉
仝鑫
PAN Xiao-qin;LU Tian-liang;DU Yan-hui;TONG Xin(College of Informationand Cyber Security,People’s Public Security University of China,Beijing 100038,China)
出处
《计算机科学》
CSCD
北大核心
2021年第8期200-208,共9页
Computer Science
基金
国家重点研发计划(2017YFB0802804)
中国人民公安大学基本科研业务费重大项目(2020JKF101)。
关键词
语音信息处理
语音合成
语音转换
深度学习
生成对抗网络
Voice information processing
Speech synthesis
Voice conversion
Deep learning
Generative adversarial networks