摘要
提出了一种新的语音说话人转换算法,利用变分贝叶斯方法估计高斯混合模型参数,进而将其应用于语音转换的声道谱参数映射过程,实现说话人身份转换。将变分贝叶斯算法用于模型参数的估计,一方面解决了训练数据量稀少情况下容易使模型产生"过拟合"的问题,另一方面通过将模型参数概率化,使得参数估计问题不再是"点估计",而成为了"全局估计",因此在一定程度上提高了模型的精度。主观和客观实验结果表明:将基于变分贝叶斯估计得到的统计模型用于语音声道谱参数的转换,明显提高了在训练数据稀少的情况下系统的鲁棒性,同时转换后语音的音质和说话人个性特征均优于经典的语音转换系统。
A new voice speaker conversion algorithm is proposed.The algorithm evaluates the parameters of Gaussian mixture model(GMM) by Variational Bayesian(VB) theory and applies it to the track spectral parameter mapping processfor voice conversion(VC) to realize the speaker conversion.The advantage of introducing VB into VC community lies in its ability to overcoming the over-fitting problem when the training data is not sufficient.Moreover,using the probability based evaluation approach,the parameters are estimated globally instead of by point estimation.It makes VB more accurate than the traditional ones such as Maximum Likelihood(ML) or Maximum a Posterior(MAP).Subjective and objective evaluation both demonstrate that the proposed algorithm based on VB works quite well,especially when the training data is sparse.In addition,the quality and the speaker individuality of the converted speech feels much better in comparison to the traditional VC system.
出处
《南京邮电大学学报(自然科学版)》
2010年第5期1-7,共7页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition