Automatic context induction for tone model integration in mandarin speech recognition 被引量：1

Automatic context induction for tone model integration in mandarin speech recognition

导出

摘要 Tone model （TM） integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent （CD） scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error （MPE） objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters. Tone model （TM） integration is an important task for mandarin speech recognition. It has been proved to be effective to use discriminatively trained scaling factors when integrating TM scores into multi-pass speech recognition. Moreover, context-dependent （CD） scaling can be applied for better interpolation between the models. One limitation of this approach is a large number of parameters will be introduced, which makes the technique prone to overtraining. In this paper, we propose to induce context-dependent model weights by using automatically derived phonetic decision trees. Question at each tree node is chosen to minimize the expected recognition error on the training data. First order approximation of the minimum phone error （MPE） objective function is used for question pruning to make tree building efficient. Experimental results on continuous mandarin speech recognition show the method is capable of inducing the most crucial phonetic contexts and obtains significant error reduction with far fewer parameters, compared with that obtained by using manually designed context-dependent scaling parameters.

作者 HUANG Hao LI Bing-hu

机构地区 Department of Information Science and Engineering Laboratory of Multi-Lingual Information Technology

出处《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2012年第1期94-100,共7页 中国邮电高校学报（英文版）

基金 supported by the National Natural Science Foundation of China (60965002) the College Research Project of Xinjiang (XJEDU2008S15) the Start-up Fund Research for Ph.D.in Xinjiang University (BS090143)

关键词 TM integration MPE decision tree mandarin speech recognition context-dependent TM integration, MPE, decision tree, mandarin speech recognition, context-dependent

分类号 TP311 [自动化与计算机技术—计算机软件与理论] TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献14

1Huang C H,Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings ofthe 25th International Conference on Acoustics,Speech,and Signal Processing (ICASSP'00):Vol 3,Jun 5-9,2000,Istanbul,Turkey. Piscataway,N J,USA:IEEE,2000:1523-1526.
2Lei X,Siu M H,Hwang M,et al. Improved tone modeling for mandarin broadcast news speech recognition. Proceedings of the 7th International Conference on Spoken Language Processing (InterSpeech/ICSLP'06),Sep 17-21,2006,Pittsburgh,PA,USA. 2006:1277-1280.
3Wang H L,Qian Y,Soong F K,et al. Improved mandarin spench recognition by lattice rescoring with enhanced tone models. Proceedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'06),Dec 13-16,2006,Singapore. LNAI 4274. Berlin,Germany:Springer-Verlag,2006:445-453.
4Beyerlein P. Discriminative model combination. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'07),Dec 17,2007,Santa Barbara,CA,USA. Piscataway,NJ,USA:IEEE,1997:238-245.
5Huang H,Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for mandarin speech recognition. Proceedings of the 33rd International Conference on Acoustics,Speech,and Signal Processing (ICASSP'08),Mar 31-Apr 4,2008,Las Vegas,NV,USA,Piscataway,NJ,USA:IEEE,2008:1541-1544.
6Hoffmeister B,Liang R,Schlüter R,et al. Log-linear model combination with word-dependent scaling factors. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:248-251.
7Liu X,Gales M,Woodland P. Use of contexts in language model interpolation and adaptation. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:360-363.
8Povey D,Woodland P C. Minimum phone error and I-smoothing for improved discriminative training. Proceedings of the 27th International Conference on Acoustics,Speech,and Signal Proceesing (ICASSP'02):Vol 1,May 13-17,2002,Orlando,FL,USA. Piscataway,NJ,USA:IEEE,2002:105-108.
9Gibson P,Hain T. Error approximation and minimum phone error acoustic model estimation. IEEE Transactions on Audio,Speech and Language Processing,2010,18(6):1269-1279.
10Young S J,Odell J P,Woodland P C. Tree-based state tying for high accuracy acoustic modeling. Proceedings of the Workshop on Human Language Technology (HLT'94),Mar 8-11,1994,Plainsboro,NJ,USA. 1994:307-312.

同被引文献14

1唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别[J].计算机学报,2006,29(4):635-641. 被引量：12
2Ostendorf M,Roukos S.A stochastic segment model for phoneme-based continuous speech recognition[J].IEEE Trans on Speech and Audio Processing,1989,37(12):1857-1869.
3Tang Yun,Liu Wenju,Zhang Hua.One-pass coarse-to-fine segmental speech decoding algorithm[C]//Proceedings of ICASSP,2006:441-444.
4Tian Ye,Jia Jia,Wang Yongxin,et al.A real-time tone enhancement method for continuous Mandarin speeches[C]//International Symposium on Chinese Spoken Language Processing,2012:405-408.
5Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition in continuous Mandarin Chinese[J].The Journal of the Acoustical Society of America,2012,132(3).
6Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition for continuous accented Mandarin Chinese[C]//Proceedings of ICASSP,2013:7180-7183.
7Yang W J,Lee J C,Chang Y C,et al.Hidden Markov model for Mandarin lexical tone recognition[J].IEEE Transactions on Acoustic Speech and Signal Processing,1988,36(7):988-992.
8Thubthong N,Kijsirikul B.Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2001,9(6):815-825.
9Peng G,Wang W S.Tone recognition of continuous Cantonese speech based on support vector machines[J].Speech Communication,2005,45(1):49-62.
10Wang Xinhao.Maximum entropy based tone modeling for Mandarin speech recognition[C]//Proceedings of ICASSP,2010:4850-4853.

引证文献1

1晁浩,宋成,刘志中.语音识别中基于发音特征的声调集成算法[J].计算机工程与应用,2014,50(23):21-25. 被引量：2

二级引证文献2

1王兴刚.英文发音中错误语音自动识别系统设计[J].现代电子技术,2018,41(10):179-182. 被引量：2
2张瑞华.英文语音纠错自动识别系统设计与实现[J].自动化技术与应用,2019,38(10):170-172. 被引量：2

1谢宏,何怡刚.Volterra级数在非线性电路频域分析中的应用[J].湖南大学学报（自然科学版）,2000,27(5):49-52. 被引量：1
2秦玉平,艾青,刘卫江.一种快速加权支持向量机训练算法[J].计算机应用研究,2007,24(7):32-34.
3段江霞,李红信,王龙奎.一种新型的Simth Fuzzy PID控制器在网络控制系统中的应用[J].甘肃科技,2013,29(11):12-15.
4佟继红 ,王继忠 .单片机锅炉自适应控制系统研究[J].北华大学学报（自然科学版）,2005,6(3):278-280. 被引量：3
5李方,李迪,黄昕.Component-based Model Integration Approach for Computer Numerical Control System Development[J].Journal of Shanghai Jiaotong university(Science),2010,15(1):36-42.
6宋文淼.矩形介质谐抵腔的格林函数问题[J].微波学报,1990,6(1):1-7.
7ZHANG CanLong,JING ZhongLiang,JIN Bo,LI ZhiXin.A dual-kernel-based tracking approach for visual target[J].Science China(Information Sciences),2012,55(3):566-576. 被引量：3
8Carbon与芯原结成IP合作伙伴[J].中国集成电路,2010(3):1-1.
9楼智美.两个耦合Van der Pol振子系统的一阶近似守恒量[J].动力学与控制学报,2016,14(4):313-317. 被引量：1
10王亚锋,孙富春,张友安,刘华平.一种跟踪问题中的次优非线性预测控制算法[J].控制与决策,2009,24(11):1682-1687. 被引量：3

The Journal of China Universities of Posts and Telecommunications

2012年第1期

浏览历史

内容加载中请稍等...

Automatic context induction for tone model integration in mandarin speech recognition 被引量：1

参考文献14

同被引文献14

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史