期刊文献+

基于迁移学习的噪声鲁棒语音识别声学建模 被引量:5

Transfer learning for acoustic modeling of noise robust speech recognition
原文传递
导出
摘要 为了提高噪声环境下语音识别系统的鲁棒性,提出了一种基于迁移学习的声学建模方法。该方法用干净语音的声学模型(老师模型)指导带噪语音的声学模型(学生模型)进行训练。学生模型在训练过程中,尽量使其逼近老师模型的后验概率分布。学生模型和老师模型间的后验概率分布差异通过相对熵(KL divergence)加以最小化。CHiME-2数据集上的实验结果表明,该方法的平均词错率(WER)比基线的绝对下降了7.29%,比CHiME-2竞赛第一名的绝对下降了3.92%。 Speech recognition in noisy environments was improved by using transfer learning to train acoustic models. The training of an acoustic model trained with noisy data (student model) is guided by an acoustic model trained with clean data (teacher model). This training process forces the posterior probability distribution of the student model to be close to the teacher model by minimizing the Kullhack-Leibler (KL) divergence between the posterior probability distribution of the student model and that of the teacher model. Tests on the CHIME-2 dataset show that this method gives a 7.29% absolute average word error rate (WER) improvement over the baseline model and 3.92% absolute average WER improvement over the best CHIME-2 system.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2018年第1期55-60,共6页 Journal of Tsinghua University(Science and Technology)
基金 国家”八六三”高技术项目(2015AA016305) 国家自然科学基金面上项目(61425017,61403386) 中国科学院战略性先导科技专项(GrantXDB02080006)
关键词 鲁棒语音识别 声学模型 神经网络 迁移学习 robust speech recognition acoustic model deep neural
  • 相关文献

参考文献2

二级参考文献35

  • 1Gong Y. Speech recognition in noisy environments: A sur- vey. Speech Communication, 1995; 16:261--291.
  • 2Huang X, Hon H W. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, 2001.
  • 3Moreno P. Speech recognition in noisy environments. Ph.D. thesis, Carnegie Mellon University, 1996.
  • 4Gales M J F. The generation and use of regression class trees for MLLR adaptation. Cambridge University, Tech. Rep. CUED/FINFENG/TR263, 1996.
  • 5Varga A, Moore R. Hidden Markov model decomposition of speech and noise. ICASSP, 1990; 2:845--848.
  • 6Ghitza O. Temporal non-plaze information in the auditory- nerve firing patterns as a front-end for speech recognition in a noisy environment. Journal of Phonetics, 1988; 16: 109--123.
  • 7Gajic B, Paliwal K K. Robust speech recognition in noisy environments based on subband spectral centroid his- tograms. IEEE Trans. Audio, Speech, and Language Pro- cessing, 2006; 14:600----608.
  • 8De La Torre Aet al. Non-linear transformations of the feature space for robust speech recognition. ICASSP, 2006: 401--404.
  • 9Du J, Wang R H. Cepstral shape normalization (CSN) for robust speech recognition. ICASSP, 2008:4389--4392.
  • 10Honig F et al. Revising perceptual linear prediction (PLP). Eurospeech, 2005:2997--3000.

共引文献14

同被引文献44

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部