期刊文献+

Automatic context induction for tone model integration in mandarin speech recognition 被引量:1

Automatic context induction for tone model integration in mandarin speech recognition
原文传递
导出
摘要 Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters. Tone model (TM) integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent (CD) scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error (MPE) objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.
出处 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2012年第1期94-100,共7页 中国邮电高校学报(英文版)
基金 supported by the National Natural Science Foundation of China (60965002) the College Research Project of Xinjiang (XJEDU2008S15) the Start-up Fund Research for Ph.D.in Xinjiang University (BS090143)
关键词 TM integration MPE decision tree mandarin speech recognition context-dependent TM integration, MPE, decision tree, mandarin speech recognition, context-dependent
  • 相关文献

参考文献14

  • 1Huang C H,Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings ofthe 25th International Conference on Acoustics,Speech,and Signal Processing (ICASSP'00):Vol 3,Jun 5-9,2000,Istanbul,Turkey. Piscataway,N J,USA:IEEE,2000:1523-1526.
  • 2Lei X,Siu M H,Hwang M,et al. Improved tone modeling for mandarin broadcast news speech recognition. Proceedings of the 7th International Conference on Spoken Language Processing (InterSpeech/ICSLP'06),Sep 17-21,2006,Pittsburgh,PA,USA. 2006:1277-1280.
  • 3Wang H L,Qian Y,Soong F K,et al. Improved mandarin spench recognition by lattice rescoring with enhanced tone models. Proceedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'06),Dec 13-16,2006,Singapore. LNAI 4274. Berlin,Germany:Springer-Verlag,2006:445-453.
  • 4Beyerlein P. Discriminative model combination. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'07),Dec 17,2007,Santa Barbara,CA,USA. Piscataway,NJ,USA:IEEE,1997:238-245.
  • 5Huang H,Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for mandarin speech recognition. Proceedings of the 33rd International Conference on Acoustics,Speech,and Signal Processing (ICASSP'08),Mar 31-Apr 4,2008,Las Vegas,NV,USA,Piscataway,NJ,USA:IEEE,2008:1541-1544.
  • 6Hoffmeister B,Liang R,Schlüter R,et al. Log-linear model combination with word-dependent scaling factors. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:248-251.
  • 7Liu X,Gales M,Woodland P. Use of contexts in language model interpolation and adaptation. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:360-363.
  • 8Povey D,Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. Proceedings of the 27th International Conference on Acoustics,Speech,and Signal Proceesing (ICASSP'02):Vol 1,May 13-17,2002,Orlando,FL,USA. Piscataway,NJ,USA:IEEE,2002:105-108.
  • 9Gibson P,Hain T. Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio,Speech and Language Processing,2010,18(6):1269-1279.
  • 10Young S J,Odell J P,Woodland P C. Tree-based state tying for high accuracy acoustic modeling. Proceedings of the Workshop on Human Language Technology (HLT'94),Mar 8-11,1994,Plainsboro,NJ,USA. 1994:307-312.

同被引文献14

  • 1唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别[J].计算机学报,2006,29(4):635-641. 被引量:12
  • 2Ostendorf M,Roukos S.A stochastic segment model for phoneme-based continuous speech recognition[J].IEEE Trans on Speech and Audio Processing,1989,37(12):1857-1869.
  • 3Tang Yun,Liu Wenju,Zhang Hua.One-pass coarse-to-fine segmental speech decoding algorithm[C]//Proceedings of ICASSP,2006:441-444.
  • 4Tian Ye,Jia Jia,Wang Yongxin,et al.A real-time tone enhancement method for continuous Mandarin speeches[C]//International Symposium on Chinese Spoken Language Processing,2012:405-408.
  • 5Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition in continuous Mandarin Chinese[J].The Journal of the Acoustical Society of America,2012,132(3).
  • 6Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition for continuous accented Mandarin Chinese[C]//Proceedings of ICASSP,2013:7180-7183.
  • 7Yang W J,Lee J C,Chang Y C,et al.Hidden Markov model for Mandarin lexical tone recognition[J].IEEE Transactions on Acoustic Speech and Signal Processing,1988,36(7):988-992.
  • 8Thubthong N,Kijsirikul B.Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2001,9(6):815-825.
  • 9Peng G,Wang W S.Tone recognition of continuous Cantonese speech based on support vector machines[J].Speech Communication,2005,45(1):49-62.
  • 10Wang Xinhao.Maximum entropy based tone modeling for Mandarin speech recognition[C]//Proceedings of ICASSP,2010:4850-4853.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部