期刊文献+

基于模型自适应的声效鲁棒性语音识别算法 被引量:1

Vocal effort related robust speech recognition based on adaptation method
下载PDF
导出
摘要 针对声音效果变化引起的语音声学特性的改变,提出基于声学模型自适应的方法。分析了正常模式下训练的声学模型在识别其他声效模式下语音的表现;根据随机段模型的模型特性,将最大似然线性回归方法引入到随机段模型系统中,并利用自适应后的声学模型来识别对应的声效模式下的语音。在"863-test"测试集上进行的汉语连续语音识别实验显示,正常模式下训练的声学模型识别其他四种声效模式下的语音时,识别精度均有较大程度的下降;而自适应后的系统在识别对应的声效模式的语音时,识别精度有了明显的改观。表明了基于声学模型自适应的方法在解决语音识别中声音效果变化问题上的有效性。 Adaptation of acoustic models is presented to cope with the acoustic variability caused vocal effort variability in Mandarin speech recognition. Acoustic models trained on normal speech are applied to recognize sentences under the remaining four vocal effort modes. The maximum likelihood linear regression adaptation method is extended to the stochastic segment model, and the acoustic models after adaptation are used to recognize speech of corresponding vocal effort mode. Experiments conducted on "863-test" show that there is significant decrease in recognition accuracy in case of mismatched speech models, and the recognition performance can be improved considerably by adaptation. This proves that adaptation of acoustic models is effective in solving the acoustic variability caused vocal effort.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第2期156-160,204,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61175066 No.61300124) 河南省基础与前沿技术研究计划资助项目(No.132300410332)
关键词 语音识别 声音效果 自适应 最大似然线性回归 speech recognition vocal effort adaptation maximum likelihood linear regression
  • 相关文献

参考文献14

  • 1Traunmuller H,Eriksson.A.Acoustic effects of variation in vocal effort by men,women,and children[J].Journal of the Acoustical Society of America,2000,107(6):3438-3451.
  • 2Petr Z,Milan S,Jiri S.Impact of vocal effort variability on automatic speech recognition[J].Speech Communication,2012,54(6):732-742.
  • 3Zhang C,Hansen J H L.Analysis and classification of speech mode:whispered through shouted[C]//Proceedings of 8th Annual Conference of the International Speech Communication Association,Antwerp,2007:2289-2292.
  • 4Lu Y,Cooke M.The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise[J].Speech Communication,2009,51(12):1253-1262.
  • 5Bou-Ghazale S E,Hansen J H L.HHM-based stressed speech modeling with application to improved synthesis and recognition of isolated speech under stress[J].IEEE Trans on Speech Audio Process,1998,6(3):201-216.
  • 6Ternstrom S,Bohman M,Sodersten M.Loud speech over noise:some spectral attributes with gender differences[J].Journal of the Acoustical Society of America,2006,119(3):1648-1665.
  • 7Brungart D S,Scott K R,Simpson B D.The influence of vocal effort on human speaker identification[C]//Proceedings of 7th European Conference on Speech Communication and Technology,Aalborg,2001:747-750.
  • 8Ito T,Takeda K,Itakura F.Analysis and recognition of whispered speech[J].Speech Communication,2007,45(2):139-152.
  • 9Meyer B T,Kollmeier B.Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition[J].Speech Communication,2011,53(5):753-767.
  • 10Kimball O,Ostendorf M,Bechwati I.Context modeling with the stochastic segment model[J].IEEE Trans on Signal Processing,1992,40(6):1584-1587.

二级参考文献17

  • 1Dugakakis V.V,Ostendorf M,Rohlicek J.R..Fast algorithms for phone classification and recognition using segment-based models.IEEE Transactions Speech Audio Processing,1992,40(12):2885~2896
  • 2Lee C,Glass R..Real-time probabilistic segmentation for segment-based speech recognition.In:Proceedings of the International Conference on Spoken Language Processing,Sydney,Australia,1998,1803~1806
  • 3Ostendorf M,Roukos S..A stochastic segment model for phoneme based continuous speech recognition.IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(12):1857~ 1869
  • 4Gish H,Ng K,Rohlicek R..Secondary processing using speech segments for an HMM word spotting system.In:Proceedings of the International Conference on Spoken Language Processing,Alberta,Canada,1992,1:17~20
  • 5Rueber B..Obtaining confidence measures from sentence probabilities.In:Proceedings of the 5th European Conference on Speech Communication and Technology,Rhodes,Greece,2001,739~742
  • 6Tan B.T,Gu Y,Thomas T..Word level confidence measures using N-Best sub-hypotheses likelihood ratio.In:Proceedings of the 7th European Conference on Speech Communication and Technology,Aalborg,Denmark,2001,2565~2568
  • 7Falavigna D,Gretter R,Riccardi G..Acoustic and word lattice based algorithms for confidence scores.In:Proceedings of the International Conference on Spoken Language Processing,Denver,USA,2002,1621~1624
  • 8Mangu L,Brill E,Stolcke A..Finding consensus among words:Lattice-based word error minimization.In:Proceedings of the 6th European Conference on Speech Communication and Technology,Budapest,1999,495~498
  • 9Rabiner L.R,Wilpon J.G,Soong F.K..High performance connected digit recognition,using hidden Markov models.IEEE Transactions on Acoustics,Speech and Signal Processing,1989,37(8):1214~1225
  • 10Deng Y.G,Huang T.Y,Xu B..Towards high performance continuous Mandarin digit string recognition.In:Proceedings of the International Conference on Spoken Language Processing,Beijing,2000,3:642~645

共引文献11

同被引文献1

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部