期刊文献+

基于双因子高斯过程动态模型的声道谱转换方法 被引量:3

Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model
下载PDF
导出
摘要 针对作者已经提出的双因子高斯过程隐变量模型(Two-factorGaussianprocesslatentvariablemodel,TF-GPLVM)用于语音转换时未考虑语音的动态特征,并且模型训练时需要估计的参数较多的问题,提出引入隐马尔科夫模型(Hidden Markov model,HMM)对语音动态特征进行建模,并利用HMM隐状态对各帧语音进行关于语义内容的概率软分类,建立了分离精度更高、运算负荷较小的双因子高斯过程动态模型(Two-factor Gaussian process dynamic model,TF-GPDM).基于此模型,设计了一种全新的基于说话人特征替换的语音声道谱转换方案.主、客观实验结果表明,无论是与传统的统计映射和频率弯折转换方法相比,还是与双因子高斯过程隐变量模型方法相比,本文方法都获得了语音质量和转换相似度的提升,以及两项性能的更佳平衡. We developed in a previous work a two-factor Gaussian process latent variable model (TF-GPLVM) to perform spectral conversion using a strategy of speaker characteristics replacement. Despite its improved performance compared with traditional mapping-based methods, the model suffers from two drawbacks: 1) it cannot capture the speech dynamical characteristics, and 2) there is a large number of parameters to estimate. To overcome these two drawbacks, we propose in this paper to combine TF-GPLVM with hidden Markov model (HMM), and develop an enhanced two-factor Gaussian process dynamic model (TF-GPDM). In the model, the speech dynamics are modeled by state transition probability of HMM, meanwhile speech frames are categorized into a limited number of phonetic content classes using HMM states. Both subjective and objective evaluations show that, compared with both traditional mapping-based methods, such as Gaussian mixture model (GMM) and FW, and TF-GPLVM based one, the proposed TF-GPDM not only improves the speech quality and identity similarity, but also reaches a better compromise between the two dimensions.
出处 《自动化学报》 EI CSCD 北大核心 2014年第6期1198-1207,共10页 Acta Automatica Sinica
基金 国家自然科学基金(61072042) 江苏省自然科学基金(BK2012510) 解放军理工大学预先研究基金(20110205 20110211)资助~~
关键词 声道谱转换 高斯过程隐变量模型 双因子模型 隐马尔科夫模型 语音动态特征 Vocal tract spectrum conversion, Gaussian process latent variable model (GPLVM), two-factor model,hidden Markov model (HMM), speech dynamical characteristics
  • 相关文献

参考文献33

  • 1Moulines E, Sagisaka Y. Voice conversion: state of the art and perspectives. Special Issue of Speech Communication. The Netherlands, 1995, 16(2): 125-126.
  • 2Furui S. Research of individuality features in speech waves and automatic speaker recognition techniques. Speech Communication, 1986, 5(2): 183-197.
  • 3Abe M, Nakamura S, Shikano K, Kuwabara H. Voice conversion through vector quantization. In: Proceedings of the 1998 IEEE International Conference on Acoustic, Speech, and Signal Processing. New York, USA: IEEE, 1988. 655-658.
  • 4Arslan L M. Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication, 1999, 28(3): 211-226.
  • 5Narendranath M, Murthy H A, Rajendran S, Yegnanarayana B. Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 1995, 16(2): 207-216.
  • 6Guido R C, Vieira L S, Júnior S B, Sanchez F L, Maciel C D, Fonseca E S, Pereira J C. A neural-wavelet architecture for voice conversion. Neurocomputing, 2007, 71(1-3): 174 -180.
  • 7Desai S, Black A W, Yegnanarayana B, Prahallad K. Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 954-964.
  • 8Stylianou Y, Cappé;O, Moulines E. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 1998, 6(2): 131-142.
  • 9Kain A B. High Resolution Voice Transformation [Ph.D. dissertation], OGI School of Science and Engineering at Oregon Health and Science University, United States, 2001.
  • 10Toda T, Black A W, Tokuda K. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2222-2235.

同被引文献4

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部