The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pi...The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pitch pattern model are deeply studied in this paper.Key problems in the corpus-based model are calculation of the distance and searching of the optimal path with dynamic programming algorithm.For the pitch pattern model,parameters such as pitch pattern,pitch average and pitch range are used to describe the pitch contour,and six pitch patterns are presented.For the generation of pitch contour,the pitch pattern model is more flexible than the corpus-based model.Both of the two models are linked to the real TTS system,and the MOS results of synthesized Mandarin speech show that the pitch pattern model is better than the corpus-based pitch model.展开更多
基金the multi-disciplinary research program entitled Utilisationof Neuro-Cognitive Science for Improvement of Language Education Projectfunded by the Hong Kong SAR Government’s Quality Education Fund Foundation (Principal Investigators: Cho-Yee To and T.F. Kwan)
基金Sponsored by the National Natural Science Foundation of China(Grant No.60503071)the 973 National Basic Research Program of China(Grant No.2004CB318102)the Postdoctor Science Foundation of China(Grant No.20070420275)
文摘The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pitch pattern model are deeply studied in this paper.Key problems in the corpus-based model are calculation of the distance and searching of the optimal path with dynamic programming algorithm.For the pitch pattern model,parameters such as pitch pattern,pitch average and pitch range are used to describe the pitch contour,and six pitch patterns are presented.For the generation of pitch contour,the pitch pattern model is more flexible than the corpus-based model.Both of the two models are linked to the real TTS system,and the MOS results of synthesized Mandarin speech show that the pitch pattern model is better than the corpus-based pitch model.