Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration t...Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.展开更多
Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech ...Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.展开更多
To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project Fo(fundamen...To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project Fo(fundamental frequency) features of neighboring syllables as compensations, and adds them to the original Fo features of the current syUable. The transforms are discriminatively trained by using an objective function termed as "minimum tone error", which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline.展开更多
This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm ob...This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.展开更多
目的获得噪声下声调识别测试(t o n e identification in noise test,TINT)材料的听力正常人识别成绩-强度(performance-intensity,PI)函数。方法利用已确立的TINT测试材料对16名年龄在21~28岁、以普通话作为日常交流语言的听力正...目的获得噪声下声调识别测试(t o n e identification in noise test,TINT)材料的听力正常人识别成绩-强度(performance-intensity,PI)函数。方法利用已确立的TINT测试材料对16名年龄在21~28岁、以普通话作为日常交流语言的听力正常人进行声调识别测试,使用SPSS17.0统计软件对结果进行分析。结果听力正常成人PI斜率分别为8.6%/d B(男声)、7.3%/d B(女声)(P=0.11);听力正常成人PI函数信噪比50阈值分别为男声(-12.9±0.8)d B、女声(-13.6±1)d B(t=2.7,P=0.016);声调类型和播音者性别对PI曲线阈值的影响即存在交互作用(F=11.7,P〈0.001)、亦存在独立作用(声调类型:F=83.7,P〈0.001;播音员性别:F=31.0,P〈0.05),其中听力正常人一声和四声识别阈值明显低于二声和三声识别阈值。结论本研究初步建立基于TINT测试材料的听力正常人噪声下声调识别PI函数,以期为临床工作和科学研究提供一个可选择的测量工具。展开更多
文摘Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.
基金supported by the National Natural Science Foundation of China (60965002)the College Research Project of Xinjiang (XJEDU2008S15)the Start-up Fund Research for Ph.D.in Xinjiang University (BS090143)
文摘Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.
文摘To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project Fo(fundamental frequency) features of neighboring syllables as compensations, and adds them to the original Fo features of the current syUable. The transforms are discriminatively trained by using an objective function termed as "minimum tone error", which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline.
基金Supported by the National High-Tech Research and Development (863) Program of China (No. 200/AA/14)
文摘This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algo- rithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral fea- tures using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua elec- tronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) sys- tem as replacements of the dynamic features with different feature combinations. In this algorithm, the or- thogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the ?dp- operator in the time direction and the ?dp- operator in the frequency di- t f rection. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hid- den Markov model (DDBHMM) based on MASR system.
文摘目的获得噪声下声调识别测试(t o n e identification in noise test,TINT)材料的听力正常人识别成绩-强度(performance-intensity,PI)函数。方法利用已确立的TINT测试材料对16名年龄在21~28岁、以普通话作为日常交流语言的听力正常人进行声调识别测试,使用SPSS17.0统计软件对结果进行分析。结果听力正常成人PI斜率分别为8.6%/d B(男声)、7.3%/d B(女声)(P=0.11);听力正常成人PI函数信噪比50阈值分别为男声(-12.9±0.8)d B、女声(-13.6±1)d B(t=2.7,P=0.016);声调类型和播音者性别对PI曲线阈值的影响即存在交互作用(F=11.7,P〈0.001)、亦存在独立作用(声调类型:F=83.7,P〈0.001;播音员性别:F=31.0,P〈0.05),其中听力正常人一声和四声识别阈值明显低于二声和三声识别阈值。结论本研究初步建立基于TINT测试材料的听力正常人噪声下声调识别PI函数,以期为临床工作和科学研究提供一个可选择的测量工具。