摘要
语音风格迁移技术是在保证语音内容不变的前提下,将源说话人的语音风格或音色转换为目标说话人的语音风格或音色。为快速了解语音风格迁移关键技术最新发展,结合近几年语音风格迁移领域的国内外研究,分别选取特征提取、语料对齐、迁移模型和声码器4个重要影响因素对研究现状进行分析,主要包括多种特征提取方法比较、平行语料库和非平行语料选择、wavenet声码器和几种相关改进后的声码器。最后,着重介绍深度神经网络应用于风格迁移模型的最新研究进展,归纳出该领域的研究现状,识别其中仍然存在的关键问题和技术挑战,并对未来研究方向和潜在应用进行展望。
Voice style transfer technology is to convert the voice style or timbre of the source speaker into the voice style or timbre of the target speaker on the premise of keeping the voice content unchanged.In order to quickly understand the latest development of key technologies of speech style transfer,combined with domestic and foreign research in the field of speech style transfer in recent years,the research status is analyzed from four important factors,including feature extraction,corpus alignment,transfer model and coder,mainly including the compari⁃son of multiple feature extraction methods,the selection of parallel corpus and non-parallel corpus.wavenet vocoder and several related im⁃proved vocoders,finally,the latest research progress of deep neural networks in style transfer model is introduced,the current research status in this field is summarized,the key problems and technical challenges are identified,and the future research direction and potential applica⁃tions are prospected.
作者
任蓬森
都云程
王洪俊
REN Pengsen;DU Yuncheng;WANG Hongjun(School of Computer,Beijing Information Science and Technology University,Beijing 100101,China;TRS Information Technology Co Ltd,Beijing 100096,China)
出处
《软件导刊》
2024年第11期12-24,共13页
Software Guide
关键词
语音风格迁移
深度学习
迁移模型
语音转换
声码器
voice style transfer
deep learning
transfer learning
voice conversion
vocoder