期刊文献+

基于多任务损失附加语言模型的语音识别方法 被引量:2

Speech recognition method based on multi-task loss with additional language model
下载PDF
导出
摘要 针对Attention过于灵活的对齐方式在复杂环境中适应性差、简单端到端模型对语言特征利用不充分的问题,研究了基于多任务损失附加语言模型的语音识别方法.通过分析语音信号特征,训练中选用包含更多信息的特征.以基于Attention的Conformer端到端模型为基础,采用CTC损失辅助纯Conformer(Attention)的多任务损失训练模型,得到Conformer-CTC语音识别模型.在Conformer-CTC模型基础上,通过分析对比部分语言模型的特点与效果,将Transformer语言模型通过重打分机制附加至上述模型的训练中,最终得到Conformer-CTC-Transformer语音识别模型.在AISHELL-1数据集上对上述模型进行了试验.结果表明:Conformer-CTC模型相对于纯Conformer(Attention)模型在测试集上的字错率(character error rate,CER)降低了0.49%,而Conformer-CTC-Transformer模型相对于Conformer-CTC模型在测试集上的CER又降低了0.79%.CTC损失可以改善Attention对齐方式在复杂环境中的适应性,并且对Conformer-CTC模型附加Transformer语言模型重打分后能再次提升0.30%的识别准确率.相较于现有的部分端到端模型,Conformer-CTC-Transformer模型识别效果较好,说明该模型具有一定的有效性. To solve the problems that the Attention′s overly flexible alignment was poorly adaptable in complex environments and the language features were not fully utilized by simple end-to-end models,a speech recognition method was investigated based on multi-task loss with additional language model.By analyzing the characteristics of the speech signal,the features containing more information were selected in the training.Based on the Attention-based Conformer end-to-end model,the model was trained using multi-task loss of CTC loss assisted pure Conformer(Attention),and the Conformer-CTC speech recognition model was obtained.Based on the Conformer-CTC model,by analyzing and comparing the characteristics and effects of some language models,the Transformer language model was added to the training of the above model through re-scoring mechanism,and the Conformer-CTC-Transformer speech recognition model was obtained.The experiments on the above model were completed on the AISHELL-1 data set.The results show that compared with the pure Conformer(Attention)model,the character error rate(CER)of the Conformer-CTC model on the test set is reduced by 0.49%,and the CER of the Conformer-CTC-Transformer model on the test set is reduced by 0.79%compared with the Conformer-CTC model.The adaptability of Attention alignment in complex environments can be improved by CTC loss,and after re-scoring the Transformer-CTC model with the Transformer language model,the recognition accuracy can be increased by 0.30%again.Compared with some existing end-to-end models,the recognition effect of the Conformer-CTC-Transformer model is better,indicating that the model has certain effectiveness.
作者 柳永利 张绍阳 王裕恒 解熠 LIU Yongli;ZHANG Shaoyang;WANG Yuheng;XIE Yi(School of Information Engineering,Chang′an University,Xi′an,Shaanxi 710064,China;Operation Management Branch of Shaanxi Transportation Holding Group Co.,Ltd.,Xi′an,Shaanxi 710065,China)
出处 《江苏大学学报(自然科学版)》 CAS 北大核心 2023年第5期564-569,共6页 Journal of Jiangsu University:Natural Science Edition
基金 陕西省重点产业创新链(群)项目(2021ZDLGY07-06)。
关键词 语音识别 深度学习 语言模型 多任务损失 CONFORMER TRANSFORMER CTC speech recognition deep learning language model multi-task loss Conformer Transformer CTC
  • 相关文献

参考文献5

二级参考文献11

共引文献60

同被引文献12

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部