摘要
为了提高噪声环境下语音识别系统的鲁棒性,提出了一种基于迁移学习的声学建模方法。该方法用干净语音的声学模型(老师模型)指导带噪语音的声学模型(学生模型)进行训练。学生模型在训练过程中,尽量使其逼近老师模型的后验概率分布。学生模型和老师模型间的后验概率分布差异通过相对熵(KL divergence)加以最小化。CHiME-2数据集上的实验结果表明,该方法的平均词错率(WER)比基线的绝对下降了7.29%,比CHiME-2竞赛第一名的绝对下降了3.92%。
Speech recognition in noisy environments was improved by using transfer learning to train acoustic models. The training of an acoustic model trained with noisy data (student model) is guided by an acoustic model trained with clean data (teacher model). This training process forces the posterior probability distribution of the student model to be close to the teacher model by minimizing the Kullhack-Leibler (KL) divergence between the posterior probability distribution of the student model and that of the teacher model. Tests on the CHIME-2 dataset show that this method gives a 7.29% absolute average word error rate (WER) improvement over the baseline model and 3.92% absolute average WER improvement over the best CHIME-2 system.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2018年第1期55-60,共6页
Journal of Tsinghua University(Science and Technology)
基金
国家”八六三”高技术项目(2015AA016305)
国家自然科学基金面上项目(61425017,61403386)
中国科学院战略性先导科技专项(GrantXDB02080006)