摘要
介绍基于声学统计建模的新一代语音合成技术.重点介绍中国科学技术大学讯飞语音实验室在发展新一代语音合成技术中的贡献,包括:融合发音器官参数与声学参数,提高声学参数生成的灵活性;以最小生成误差准则取代最大似然准则,提高合成语音的音质;使用单元挑选与波形拼接方法取代参数合成器重构,从根本上改善HMM参数语音合成器在合成语音音质上的不足.这些技术创新使得新一代语音合成在自然度、表现力、灵活性及多语种实现等方面的性能都有进一步的提升.
This paper introduces acoustic statistical modeling based new generation speech synthesis technology. Emphasis is laid on the research progress in the field of new generation speech synthesis technology contributed by USTC iFlytek speech laboratory, which includes integration articulatory and acoustic features for improving the flexibility of acoustic parameter generation; a minimum generation error (MGE) criterion proposed to replace maximum likelihood for improving synthesized speech quality; use of unit selection and waveform concatenation to replace parametric synthesizer, thus effectively avoiding the limitation of speech quality in HMM based parametric synthesis. These technical innovations may further improve the performance of new generation speech synthesis technology in naturalness, expressiveness, flexibility and multilingual realization, etc.
基金
国家自然科学基金(69975018
60475015)资助
关键词
语音合成
隐马尔可夫模型
参数合成
单元挑选
speech synthesis
hidden Markov model
parametric synthesis
unit selection