Camouflage voice is the common check material form in judicial voice testing field that brings about many difficulties to speaker identification. Aiming at the electroacoustic disguised voice, we get fundamental frequ...Camouflage voice is the common check material form in judicial voice testing field that brings about many difficulties to speaker identification. Aiming at the electroacoustic disguised voice, we get fundamental frequency variation rule before and after voice change of multiple corpuses by analyzing map and data. The results show that the fundamental frequency before and after voice change exists a linearity relationship, we can realize speaker identification in electroacoustic disguised speech field through comparing Chinese pitch pattern.展开更多
The word control key is one of the applications of talker recognition. Chinese differs from English or Japanese, and the four Chinese tones are important characteristics,In this paper, the four Chinese tones are analy...The word control key is one of the applications of talker recognition. Chinese differs from English or Japanese, and the four Chinese tones are important characteristics,In this paper, the four Chinese tones are analysed and nine parameters are introduced. They are (1) the pitch period (2) the two parameters of the time length of talker pronounciation, (3) the pitch pattern gradinet (in 5 sections) and (4) the sum of spectral distances. Many Chinese words consist of one syllable, lt's combinationrules consist of single vowels, diphthongs or trinphtiogs.The spectrum of the vowel uses FFT in the medium frame.This method uses a group of input signals that consists of three input signal, and decides the logic “pass” or “out” .The experimental recognition ratio is over 90% for the reference talker, 7.2% for the other talker with the reference input word,and 0% for the other talker with random input words.展开更多
The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pi...The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pitch pattern model are deeply studied in this paper.Key problems in the corpus-based model are calculation of the distance and searching of the optimal path with dynamic programming algorithm.For the pitch pattern model,parameters such as pitch pattern,pitch average and pitch range are used to describe the pitch contour,and six pitch patterns are presented.For the generation of pitch contour,the pitch pattern model is more flexible than the corpus-based model.Both of the two models are linked to the real TTS system,and the MOS results of synthesized Mandarin speech show that the pitch pattern model is better than the corpus-based pitch model.展开更多
文摘Camouflage voice is the common check material form in judicial voice testing field that brings about many difficulties to speaker identification. Aiming at the electroacoustic disguised voice, we get fundamental frequency variation rule before and after voice change of multiple corpuses by analyzing map and data. The results show that the fundamental frequency before and after voice change exists a linearity relationship, we can realize speaker identification in electroacoustic disguised speech field through comparing Chinese pitch pattern.
文摘The word control key is one of the applications of talker recognition. Chinese differs from English or Japanese, and the four Chinese tones are important characteristics,In this paper, the four Chinese tones are analysed and nine parameters are introduced. They are (1) the pitch period (2) the two parameters of the time length of talker pronounciation, (3) the pitch pattern gradinet (in 5 sections) and (4) the sum of spectral distances. Many Chinese words consist of one syllable, lt's combinationrules consist of single vowels, diphthongs or trinphtiogs.The spectrum of the vowel uses FFT in the medium frame.This method uses a group of input signals that consists of three input signal, and decides the logic “pass” or “out” .The experimental recognition ratio is over 90% for the reference talker, 7.2% for the other talker with the reference input word,and 0% for the other talker with random input words.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60503071)the 973 National Basic Research Program of China(Grant No.2004CB318102)the Postdoctor Science Foundation of China(Grant No.20070420275)
文摘The function of prosody model will directly affect the naturalness of synthesized speech.Aimed at the difficulty in generating the pitch contour in prosody model,two pitch models namely corpus-based pitch model and pitch pattern model are deeply studied in this paper.Key problems in the corpus-based model are calculation of the distance and searching of the optimal path with dynamic programming algorithm.For the pitch pattern model,parameters such as pitch pattern,pitch average and pitch range are used to describe the pitch contour,and six pitch patterns are presented.For the generation of pitch contour,the pitch pattern model is more flexible than the corpus-based model.Both of the two models are linked to the real TTS system,and the MOS results of synthesized Mandarin speech show that the pitch pattern model is better than the corpus-based pitch model.