期刊文献+

Trainable unit selection speech synthesis under statistical framework 被引量:1

Trainable unit selection speech synthesis under statistical framework
原文传递
导出
摘要 This paper proposes a trainable unit selection speech synthesis method based on statistical modeling framework. At training stage, acoustic features are extracted from the training database and statistical models are estimated for each feature. During synthesis, the optimal candidate unit sequence is searched out from the database following the maximum likelihood criterion derived from the trained models. Finally, the waveforms of the optimal candidate units are concatenated to produce synthetic speech. Experiment results show that this method can improve the automation of system construction and naturalness of synthetic speech effectively compared with the conventional unit selection synthe-sis method. Furthermore, this paper presents a minimum unit selection error model training criterion according to the characteristics of unit selection speech synthesis and adopts discriminative training for model parameter estimation. This criterion can finally achieve the full automation of system con-struction and improve the naturalness of synthetic speech further. This paper proposes a trainable unit selection speech synthesis method based on statistical modeling framework. At training stage, acoustic features are extracted from the training database and statistical models are estimated for each feature. During synthesis, the optimal candidate unit sequence is searched out from the database following the maximum likelihood criterion derived from the trained models. Finally, the waveforms of the optimal candidate units are concatenated to produce synthetic speech. Experiment results show that this method can improve the automation of system construction and naturalness of synthetic speech effectively compared with the conventional unit selection synthe- sis method. Furthermore, this paper presents a minimum unit selection error model training criterion according to the characteristics of unit selection speech synthesis and adopts discriminative training for model parameter estimation. This criterion can finally achieve the full automation of system con- struction and improve the naturalness of synthetic speech further.
机构地区 iFLYTEK Speech Lab
出处 《Chinese Science Bulletin》 SCIE EI CAS 2009年第11期1963-1969,共7页
基金 Supported by the National Natural Science Foundation of China (Grant Nos. 60475015, 60610298) National Hi-Tech Research and Development Program of China (Grant Nos. 2006AA01Z137 and 2006AA010104)
关键词 语音合成 训练模式 建模框架 单位 统计 自动化系统 模型参数估计 合成语音 speech synthesis, unit selection and waveform concatenation, statistical modeling, maximum likelihood criterion
  • 相关文献

参考文献1

同被引文献5

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部