摘要
语音转换在教育、娱乐、医疗等各个领域都有广泛的应用,为了得到高质量的转换语音,提出了基于多谱特征生成对抗网络的语音转换算法。利用生成对抗网络对由谱特征参数生成的声纹图进行转换,利用特征级多模态融合技术使网络学习来自不同特征域的多种信息,以提高网络对语音信号的感知能力,从而得到具有良好清晰度和可懂度的高质量转换语音。实验结果表明,在主、客观评价指标上,本文算法较传统算法均有明显提升。
Voice conversion is widely used in education,entertainment,medical and other fields.In order to obtain high-quality converted speech,this paper proposes a voice conversion algorithm based on multi-spectral feature generative adversarial network.It uses generative adversarial network to convert the voiceprint obtained by spectral feature parameters.The feature-level multimodal fusion technique is used to make the network learn multiple spectral feature information from different feature domains,so as to improve the perception of speech signals of the network.Finally,the high-quality converted speech with good definition and intelligibility is obtained.The experimental results show that the proposed algorithm is significantly superior to the traditional algorithms in the subjective and objective evaluation indicators.
作者
张筱
张巍
王文浩
万永菁
ZHANG Xiao;ZHANG Wei;WANG Wen-hao;WAN Yong-jing(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第5期893-901,共9页
Computer Engineering & Science
关键词
语音转换
声纹图
生成对抗网络
多谱特征
跨域重建误差
voice conversion
voiceprint
generative adversarial network
multi-spectral feature
cross-domain reconstruction error