期刊文献+

High-quality voice conversion system based on GMM statistical parameters and RBF neural network 被引量:3

High-quality voice conversion system based on GMM statistical parameters and RBF neural network
原文传递
导出
摘要 A voice conversion (VC) system was designed based on Gaussian mixture model (GMM) and radial basis function (RBF) neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech. A voice conversion (VC) system was designed based on Gaussian mixture model (GMM) and radial basis function (RBF) neural network. As a voice conversion model, RBF network needs quantities of training data to improve its performance. For one speech, the networks trained by different segments of data have different transformation effects. Since trying segment by segment to obtain the best conversion effect is complex, a conversion method was proposed, that uses GMM for statistics before training RBF network to aim at the problem. The speech transformation and representation using adaptive interpolation of weighted spectrum (STRAIGHT) model is used for accurate extraction of vocal tract spectrum. Then GMM is used to classify the numerous spectral parameters. The obtained mean parameters were trained in RBF network. Experiment reveals that, the soft classification ability of GMM can promptly realize the reduction and classification of training data under the premise of ensuring the training effect. The selection complexity is decreased thereafter. Compared to the conventional RBF network training methods, this method can make the transformation of spectral parameters more effective and improve the quality of converted speech.
出处 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2014年第5期68-75,93,共9页 中国邮电高校学报(英文版)
基金 supported by the Key Natural Science Foundation of the Jiangsu Higher Education Institutions of China (13KJA510003)
关键词 VC system STRAIGHT vocal tract spectrum GMM RBF VC system, STRAIGHT, vocal tract spectrum, GMM, RBF
  • 相关文献

参考文献1

二级参考文献13

  • 1ABE M, NAKAMURA S, SHIKANO K, et al. Voice conversion through vector quantization[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol 1. New York :IEEE Press, 1988 : 655-658.
  • 2TODA T, BLACK A W, TOKUDA K. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Vol 1. Philadelphia:IEEE Press, 2005:9-12.
  • 3ERRO Daniel , MORENO Asuncion. Voice conversion based on weighted frequency warping[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.Barcelona: IEEE Press,2009:922-931.
  • 4NARENDRANATH M, MURTHY H A, RAJENDRAN S, et al. Transformation of formants for voice conversion using artificial neural networks[J]. Speech Communication, 1995, 16:207-216.
  • 5BAUDOIN G, STYLINAOU Y. On the transformation of the speech spectrum for voice conversion[C]//Proeeedings of ICSLP'96, Vol 3. Philadelphia:IEEE Press. 1996:1405- 1408.
  • 6WATANABE T, MURAKAMI T, NAMBA M, et al. Transformation of spectral envelope for voice conversion based on radial basis function network[C]//Proceedings of International Conference on Spoken Language Processing, 2002. Denver: IEEE Press,2002: 285-288.
  • 7DESAI S, RAGHAVENDRA E V, YEGNANARAYANA B, et al.Voice conversion using artificial neural networks [C]// Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2009.Taipei:IEEE Press, 2009:3893-3896.
  • 8IRINO T, MINAMI Y, NAKATANI T, et al. Evaluation of a speech recognition/generation method based on hmm and straight[C]//Proceedings of the ICSLP, 2002. Dunedin:IEEE Press, 2002.
  • 9KAWAHARA H. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited[C]//Technical Report of IEICE. Wakayama:[s.n.], 1996:9-16.
  • 10KAWAHARA H. Restructuring speech representations using a pitch adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds[J]. Speech Communication, 1999,2:1303-1306.

共引文献4

同被引文献35

  • 1陈凤东,洪炳镕.基于动态阈值背景差分算法的目标检测方法[J].哈尔滨工业大学学报,2005,37(7):883-884. 被引量:43
  • 2尚赵伟,张明新,赵平,沈钧毅.基于不同复小波变换方法的纹理检索和相似计算[J].计算机研究与发展,2005,42(10):1746-1751. 被引量:11
  • 3薄华,马缚龙,焦李成.图像纹理的灰度共生矩阵计算问题的分析[J].电子学报,2006,34(1):155-158. 被引量:203
  • 4刘俊萍,严敏,胡坚,王亚宜.基于径向基函数神经网络的污水生物处理模拟[J].计算机系统应用,2006,15(12):51-53. 被引量:1
  • 5Haralick R M,ShanmugamK,DinsteinI. Textural feature for image classification[J]. IEEE Trans. on System Man and Cybernetics,1973,3(6) :610--621.
  • 6Jagdish Lal Raheja, Sunil Kumar, Ankit Chaudhary. Fabric defect detection based on GLCM and Gabor filter.. A comparison[J]. Optik-International Journal for Light and Electron Optics, 2013,124(23) ..6469--6474.
  • 7Sun Le, Wu Zebin, Liu J ianjun, et al. Supervised Spectral-Spatial H yperspectral Image Classification With Weighted Markov Random FieldsEJ]. IEEE Trans. on Geoscience and Remote Sensing,2015,53(3):1490--1503.
  • 8Zhu Xiaolin, Liu Desheng. MAP-MRF Approach to Landsat ETM-VSLC-off Image Classification[J]. IEEE Trans. on Geoscience and Remote Sensing,2014,52(2) ..1131--1141.
  • 9Zhou Shichong, Shi Jun, Zhu Jie,et al. Shearlet-based texture feature extraction for classification of breast tumor in ultrasound image[J]. Biomedical Signal Processing and Control, 2013,8 (6) : 688-- 698.
  • 10Gai Shah, Yang Guowei, Zhang Sheng. Multiscale texture classification using reduced quaternion wavelet transform [J].AEU-International Journal of Electronics and Communications, 2013,67 (3) : 233 -- 241.

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部