期刊文献+

基于声学统计建模的新一代语音合成技术 被引量:1

Acoustic statistical modeling based new generation speech synthesis technology
下载PDF
导出
摘要 介绍基于声学统计建模的新一代语音合成技术.重点介绍中国科学技术大学讯飞语音实验室在发展新一代语音合成技术中的贡献,包括:融合发音器官参数与声学参数,提高声学参数生成的灵活性;以最小生成误差准则取代最大似然准则,提高合成语音的音质;使用单元挑选与波形拼接方法取代参数合成器重构,从根本上改善HMM参数语音合成器在合成语音音质上的不足.这些技术创新使得新一代语音合成在自然度、表现力、灵活性及多语种实现等方面的性能都有进一步的提升. This paper introduces acoustic statistical modeling based new generation speech synthesis technology. Emphasis is laid on the research progress in the field of new generation speech synthesis technology contributed by USTC iFlytek speech laboratory, which includes integration articulatory and acoustic features for improving the flexibility of acoustic parameter generation; a minimum generation error (MGE) criterion proposed to replace maximum likelihood for improving synthesized speech quality; use of unit selection and waveform concatenation to replace parametric synthesizer, thus effectively avoiding the limitation of speech quality in HMM based parametric synthesis. These technical innovations may further improve the performance of new generation speech synthesis technology in naturalness, expressiveness, flexibility and multilingual realization, etc.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2008年第7期725-734,共10页 JUSTC
基金 国家自然科学基金(69975018 60475015)资助
关键词 语音合成 隐马尔可夫模型 参数合成 单元挑选 speech synthesis hidden Markov model parametric synthesis unit selection
  • 相关文献

参考文献28

  • 1Wang R H,Chen S H, Tao J, et al. Mandarin Text- To-Speech Synthesis[C]//Advances in Chinese Spoken Language Processing. Beijing:World Scientific Publishing, 2007.
  • 2Campbell W N, Black A W. Prosody and the selection of source units for concatenative synthesis [ J]. Progress in Speech Synthesis, 1996: 279-282.
  • 3Iwahashi N, Kaiki N, SagisakaY. Concatenative speech synthesis by minimum distortion criteria[C]// International Conference on Acoustics, Speech, and Signal Processing. 1992, 2: 65-68.
  • 4Wang R H, Ma Z K, Zhu D L. A corpus-based Chinese speech synthesis with contextual-dependant unit selection[C]//International Conference on Spoken Language Processing. 2000: 391-394.
  • 5D onovan R E. Trainable speech synthesis[D]. Ph. D Dissertation, Cambridge University, 1996.
  • 6Tokuda K, Zen H, Black A W. HMM-based approach to multilingual speech synthesis [C]//Text to Speech Synthesis: New Paradigms and Advances. New York: Prentice Hall, 2004.
  • 7Tokuda K, Masuko T, Miyazaki N, et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling [ C ]// International Conference on Acoustics, Speech, and Signal Processing. 1999,1: 229-232.
  • 8Shinoda K, Watanabe T. MDL-based contextdependent subword modeling for speech recognition [J]. Journal of Acoustical Society of Japan, 2000, 21 (2) : 79-86.
  • 9Yoshimura T, Tokuda K, Masuko T, et al. Duration modeling in HMM-based speech synthesis system [C]// International Conference on Spoken Language Processing. 1998, 2: 29-32.
  • 10Tokuda K, Kobayashi T, Imai S. Speech parameter generation from HMM using dynamic features [C]// International Conference on Acoustics, Speech, and Signal Processing. 1995: 660-663.

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部