摘要
在基于隐Markov模型(Hidden Markov Model,HMM)的统计参数藏语语音合成中引入了DAEM(Deterministic Annealing EM)算法,对没有时间标注的藏语训练语音进行自动时间标注。以声母和韵母为合成基元,在声母和韵母的声学模型的训练过程中,利用DAEM算法确定HMM模型的嵌入式重估的最佳参数。训练好声学模型后,再利用强制对齐自动获得声母和韵母的时间标注。实验结果表明,该方法对声母和韵母的时间标注接近手工标注的结果。对合成的藏语语音进行主观评测表明,该方法合成的藏语语音和手工标注声、韵母时间的方法合成的藏语语音的音质接近。因此,利用该方法可以在不需要声、韵母的时间标注的情况下建立合成基元的声学模型。
This paper introduces a Deterministic Annealing Expectation Maximum(DAEM)algorithm into the HMM-based Tibetan speech synthesis to label the time boundary of speech synthesis unit for non-labeled training speech corpus automatically. The initial and the final are used as the speech synthesis units. The DAEM algorithm is used for determining the optimal parameters of the embedded re-evaluation during the model training. The boundaries of speech synthesis units are obtained by a force alignment in acoustic model training of speech synthesis unit. Tests show that the unit boundary obtained by the proposed method is close to the manually labeled boundary. Subjective evaluation on quality of synthesized speech shows that the synthesized Tibetan speech is also similar to the synthesized speech with manually labeled speech corpus.Therefore, proposed method can be used for training acoustic modes of Tibetan speech synthesis with non-labeled training speech corpus.
出处
《计算机工程与应用》
CSCD
北大核心
2015年第6期199-203,共5页
Computer Engineering and Applications
基金
国家自然科学基金(No.61263036)
甘肃省杰出青年基金(No.1210RJDA007)
甘肃省自然科学基金(No.1107RJZA112
No.1208RJYA078)