期刊文献+

面向藏语语音合成的语音基元自动标注方法 被引量:6

Speech unit segmentation for Tibetan speech synthesis
下载PDF
导出
摘要 在基于隐Markov模型(Hidden Markov Model,HMM)的统计参数藏语语音合成中引入了DAEM(Deterministic Annealing EM)算法,对没有时间标注的藏语训练语音进行自动时间标注。以声母和韵母为合成基元,在声母和韵母的声学模型的训练过程中,利用DAEM算法确定HMM模型的嵌入式重估的最佳参数。训练好声学模型后,再利用强制对齐自动获得声母和韵母的时间标注。实验结果表明,该方法对声母和韵母的时间标注接近手工标注的结果。对合成的藏语语音进行主观评测表明,该方法合成的藏语语音和手工标注声、韵母时间的方法合成的藏语语音的音质接近。因此,利用该方法可以在不需要声、韵母的时间标注的情况下建立合成基元的声学模型。 This paper introduces a Deterministic Annealing Expectation Maximum(DAEM)algorithm into the HMM-based Tibetan speech synthesis to label the time boundary of speech synthesis unit for non-labeled training speech corpus automatically. The initial and the final are used as the speech synthesis units. The DAEM algorithm is used for determining the optimal parameters of the embedded re-evaluation during the model training. The boundaries of speech synthesis units are obtained by a force alignment in acoustic model training of speech synthesis unit. Tests show that the unit boundary obtained by the proposed method is close to the manually labeled boundary. Subjective evaluation on quality of synthesized speech shows that the synthesized Tibetan speech is also similar to the synthesized speech with manually labeled speech corpus.Therefore, proposed method can be used for training acoustic modes of Tibetan speech synthesis with non-labeled training speech corpus.
出处 《计算机工程与应用》 CSCD 北大核心 2015年第6期199-203,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.61263036) 甘肃省杰出青年基金(No.1210RJDA007) 甘肃省自然科学基金(No.1107RJZA112 No.1208RJYA078)
关键词 藏语语音合成 确定性退火期望值最大化(DAEM)算法 自动标注 时间标注 Tibetan speech synthesis Deterministic Annealing Expectation Maximum(DAEM)algorithm automatically label time label
  • 相关文献

参考文献17

  • 1Hunt A J,Black A W.Unit selection in a concatenative speech synthesis system using a large speech database[C]//Acoustics,Speech,and Signal Processing,ICASSP-96,1996:373-376.
  • 2Zen H,Tokuda K,Black A W.Statistical parametric speech synthesis[J].Speech Communication,2009,51(11):1039-1064.
  • 3Yamagishi J,Onishi K,Masuko T,et al.Modeling of various speaking styles and emotions for HMM-based speech synthesis[C]//interspeech,2003:2461-2464.
  • 4Tamura M,Masuko T,Tokuda K,et al.Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR[C]//2001 IEEE International Conference on Acoustics,Speech,and Signal Processing,2001:805-808.
  • 5Zen H,Toda T,Nakamura M,et al.Details of the Nitech HMM-based speech synthesis system for the Blizzard challenge 2005[J].IEICE Transactions on Information and Systems,2007,90(1):325-333.
  • 6Dines J,Sridharan S.Trainable speech synthesis with trended hidden Markov models[C]//2001 IEEE International Conference on Acoustics,Speech,and Signal Processing,2001:833-836.
  • 7Shannon M,Zen H,Byrne W.Autoregressive models for statistical parametric speech synthesis[J].IEEE Transactions on Audio,Speech,and Language Processing,2013,21(3):587-597.
  • 8康世胤,段全盛,双志伟,等.HMM语音合成中基频清浊音优化算法研究[C]//全国人机语音通讯学术会议论文集.兰州:兰州大学出版社,2009:317-321.
  • 9王永鑫,贾珈,张雨辰,蔡莲红.基于HMM语音合成的语调控制[J].清华大学学报(自然科学版),2013,53(6):781-786. 被引量:5
  • 10王海燕,杨鸿武,甘振业,等.基于说话人自适应训练的汉藏双语语音合成[C]//第十二届全国人机语音通讯学术会议(NCMMSC’2013)论文集,2013.

二级参考文献34

共引文献11

同被引文献38

引证文献6

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部