期刊文献+

一种稀少训练数据条件下的语音转换算法 被引量:1

A Voice Conversion Algorithm in the Context of Sparse Training Data
下载PDF
导出
摘要 提出了一种新的语音说话人转换算法,利用变分贝叶斯方法估计高斯混合模型参数,进而将其应用于语音转换的声道谱参数映射过程,实现说话人身份转换。将变分贝叶斯算法用于模型参数的估计,一方面解决了训练数据量稀少情况下容易使模型产生"过拟合"的问题,另一方面通过将模型参数概率化,使得参数估计问题不再是"点估计",而成为了"全局估计",因此在一定程度上提高了模型的精度。主观和客观实验结果表明:将基于变分贝叶斯估计得到的统计模型用于语音声道谱参数的转换,明显提高了在训练数据稀少的情况下系统的鲁棒性,同时转换后语音的音质和说话人个性特征均优于经典的语音转换系统。 A new voice speaker conversion algorithm is proposed.The algorithm evaluates the parameters of Gaussian mixture model(GMM) by Variational Bayesian(VB) theory and applies it to the track spectral parameter mapping processfor voice conversion(VC) to realize the speaker conversion.The advantage of introducing VB into VC community lies in its ability to overcoming the over-fitting problem when the training data is not sufficient.Moreover,using the probability based evaluation approach,the parameters are estimated globally instead of by point estimation.It makes VB more accurate than the traditional ones such as Maximum Likelihood(ML) or Maximum a Posterior(MAP).Subjective and objective evaluation both demonstrate that the proposed algorithm based on VB works quite well,especially when the training data is sparse.In addition,the quality and the speaker individuality of the converted speech feels much better in comparison to the traditional VC system.
作者 徐宁 杨震
出处 《南京邮电大学学报(自然科学版)》 2010年第5期1-7,共7页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
关键词 变分贝叶斯估计 高斯混合模型 语音转换 声道谱参数 稀少训练数据 variational Bayesian Gaussian mixture model voice conversion spectral mapping sparse data
  • 相关文献

参考文献11

  • 1KUWABARA H, SAGISAKA Y. Acoustic characteristics of speaker individuality : control and conversion [J]. Speech Communication, 1995,16(2) :165 - 173.
  • 2ABE M, NAKAMURA S, SHIKANO K, et al. Voice conversion through vector quantization [ C ] // IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 1988 : 655 - 658.
  • 3VALBRET H, MOULINES E,TUBACH J P. Voice transformation using PSOLA technique [ J ]. Speech Communication, 1992,1 ( 1 ) : 145 - 148.
  • 4NARENDRANATH M, MURTHY H A, RAJENDRAN S, et al. Transformation of formants for voice conversion using artificial neural networks [ J ]. Speech Communication, 1995,16 (5) :207 - 216.
  • 5STYLIANOU Y,CAPPE O,MOULINES E. Continuous probabilistic transform for voice conversion [ J ]. IEEE Transactions on Speech and Audio Processing, 1998,6( 2 ) : 131 - 142.
  • 6KAIN A. High resolution voice transformation [ D ]. PhD Thesis of OGI School of Science and Engineering,2001.
  • 7TODA T, SARUWATARI H, SHIKANO K. Voice conversion algorithm based on Gaussian Mixture Model with dynamic frequency warping of STRAIGHT spectrum [ C ]//ICSSAP. Piscataway : IEEE, 2001:841 - 844.
  • 8JIAN Zhihua,YANG Zhen. Voice conversion using canonical correlation analysis based on Gaussian Mixture Model [ C ]//IEEE 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Piscataway : IEEE, 2006 : 210 - 215.
  • 9JIAN Zhihua, YANG Zhen. Voice conversion using Viterbi algorithm based on Gaussian mixture model [ C ] //IEEE International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway : IEEE,2007:764 - 767.
  • 10ATTIAS H. A variational bayesian framework for graphical models [ C ]// Advances in Neural Information Processing Systems. Cambridge : MIT Press,2000 : 1 - 8.

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部