期刊文献+

语音识别中基于发音特征的声调集成算法 被引量:2

Integrating tone models into speech recognition system based on articulatory feature
下载PDF
导出
摘要 提出基于发音特征的声调建模改进方法,并将其用于随机段模型的一遍解码中。根据普通话的发音特点,确定了用于区别汉语元音、辅音信息的7种发音特征,并以此为目标值利用阶层式多层感知器计算语音信号属于发音特征的35个类别后验概率,将该概率作为发音特征与传统的韵律特征一起用于声调建模。根据随机段模型的解码特点,在两层剪枝后对保留下来的路径计算其声调模型概率得分,加权后加入路径总的概率得分中。在"863-test"测试集上进行的实验结果显示,使用了新的发音特征集合中声调模型的识别精度提高了3.11%;融入声调信息后随机段模型的字错误率从13.67%下降到12.74%。表明了将声调信息应用到随机段模型的可行性。 The tone model based on articulatory features is improved in this paper, and a framework is proposed which attempts to integrate the proposed tone model into stochastic segment based Mandarin speech recognition system. A set of seven articulatory features which represent the articulatory information is given. As well as prosodic features, the posteriors of speech signal belonging to the 35 pronunciation categories of articulatory features are used for tone modeling. The tone models are fused into the SSM-based speech recognition system after second pruning according to the property of segmental models. Tone recognition experiments conducted on“863-test”set indicate that about 3.11% absolute increase of accuracy can be achieved when using new articulatory features. When the proposed tone model is integrated into SSM system, the character error rate is reduced significantly. Thus, potential of the method is demonstrated.
出处 《计算机工程与应用》 CSCD 2014年第23期21-25,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.61300124) 河南省基础与前沿技术研究计划资助项目(No.132300410332)
关键词 语音识别 随机段模型 声调建模 发音特征 阶层式多层感知器 speech recognition stochastic segment modeling tone modeling articulatory feature hierarchical multilayer perceptron classifiers
  • 相关文献

参考文献15

  • 1Ostendorf M,Roukos S.A stochastic segment model for phoneme-based continuous speech recognition[J].IEEE Trans on Speech and Audio Processing,1989,37(12):1857-1869.
  • 2唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别[J].计算机学报,2006,29(4):635-641. 被引量:12
  • 3晁浩,杨占磊,刘文举.汉语语音识别中声学界标点引导的随机段模型解码算法[J].计算机科学,2013,40(10):208-212. 被引量:1
  • 4Tang Yun,Liu Wenju,Zhang Hua.One-pass coarse-to-fine segmental speech decoding algorithm[C]//Proceedings of ICASSP,2006:441-444.
  • 5HUANG Hao,LI Bing-hu.Automatic context induction for tone model integration in mandarin speech recognition[J].The Journal of China Universities of Posts and Telecommunications,2012,19(1):94-100. 被引量:1
  • 6Tian Ye,Jia Jia,Wang Yongxin,et al.A real-time tone enhancement method for continuous Mandarin speeches[C]//International Symposium on Chinese Spoken Language Processing,2012:405-408.
  • 7Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition in continuous Mandarin Chinese[J].The Journal of the Acoustical Society of America,2012,132(3).
  • 8Wu Jiang,Zahorian S A,Hu Hongbing.Tone recognition for continuous accented Mandarin Chinese[C]//Proceedings of ICASSP,2013:7180-7183.
  • 9Yang W J,Lee J C,Chang Y C,et al.Hidden Markov model for Mandarin lexical tone recognition[J].IEEE Transactions on Acoustic Speech and Signal Processing,1988,36(7):988-992.
  • 10Thubthong N,Kijsirikul B.Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2001,9(6):815-825.

二级参考文献56

  • 1唐赟,刘文举,徐波.基于后验概率解码段模型的汉语语音数字串识别[J].计算机学报,2006,29(4):635-641. 被引量:12
  • 2Gunawardana A,Hahajan M,Acero A,et al. Hidden conditional random fields for phone classification. Proceedings of the 9th European Conference on Speech Communication and Technology (EuroSpeech'05),Sep 4-8,2005,Lisbon,Portugal. 2005:1117-1120.
  • 3Quattoni A,Wang S,Morency L P,et al. Hidden conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007,29(10):1848-1852.
  • 4Huang C H,Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings ofthe 25th International Conference on Acoustics,Speech,and Signal Processing (ICASSP'00):Vol 3,Jun 5-9,2000,Istanbul,Turkey. Piscataway,N J,USA:IEEE,2000:1523-1526.
  • 5Lei X,Siu M H,Hwang M,et al. Improved tone modeling for mandarin broadcast news speech recognition. Proceedings of the 7th International Conference on Spoken Language Processing (InterSpeech/ICSLP'06),Sep 17-21,2006,Pittsburgh,PA,USA. 2006:1277-1280.
  • 6Wang H L,Qian Y,Soong F K,et al. Improved mandarin spench recognition by lattice rescoring with enhanced tone models. Proceedings of the 5th International Symposium on Chinese Spoken Language Processing (ISCSLP'06),Dec 13-16,2006,Singapore. LNAI 4274. Berlin,Germany:Springer-Verlag,2006:445-453.
  • 7Beyerlein P. Discriminative model combination. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'07),Dec 17,2007,Santa Barbara,CA,USA. Piscataway,NJ,USA:IEEE,1997:238-245.
  • 8Huang H,Zhu J. Discriminative incorporation of explicitly trained tone models into lattice based rescoring for mandarin speech recognition. Proceedings of the 33rd International Conference on Acoustics,Speech,and Signal Processing (ICASSP'08),Mar 31-Apr 4,2008,Las Vegas,NV,USA,Piscataway,NJ,USA:IEEE,2008:1541-1544.
  • 9Hoffmeister B,Liang R,Schlüter R,et al. Log-linear model combination with word-dependent scaling factors. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:248-251.
  • 10Liu X,Gales M,Woodland P. Use of contexts in language model interpolation and adaptation. Proceedings of the 10th International Conference on Spoken Language Processing (InterSpeech/ICSLP'09),Sep 26-30,2009:Brighton,UK. 2009:360-363.

共引文献19

同被引文献7

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部