期刊文献+

基于双向长短时记忆联结时序分类和加权有限状态转换器的端到端中文语音识别系统 被引量:16

End-to-end Chinese speech recognition system using bidirectional long short-term memory networks and weighted finite-state transducers
下载PDF
导出
摘要 针对隐马尔可夫模型(HMM)在语音识别中存在的不合理条件假设,进一步研究循环神经网络的序列建模能力,提出了基于双向长短时记忆神经网络的声学模型构建方法,并将联结时序分类(CTC)训练准则成功地应用于该声学模型训练中,搭建出不依赖于隐马尔可夫模型的端到端中文语音识别系统;同时设计了基于加权有限状态转换器(WFST)的语音解码方法,有效解决了发音词典和语言模型难以融入解码过程的问题。与传统GMM-HMM系统和混合DNN-HMM系统对比,实验结果显示该端到端系统不仅明显降低了识别错误率,而且大幅提高了语音解码速度,表明了该声学模型可以有效地增强模型区分度和优化系统结构。 For the assumption of unreasonable conditions in speech recognition by Hidden Markov Model(HMM),the ability of sequence modeling of recurrent neural networks was further studied,an acoustic model based on Bidirectional Long Short-Term Memory(BLSTM)neural networks was proposed.The training criterion based on Connectionist Temporal Classification(CTC)was successfully applied to the acoustic model training,and an end-to-end Chinese speech recognition system was built which does not rely on HMM.Meanwhile,a speech decoding method based on Weighted Finite-State Transducer(WFST)was designed to effectively solve the problem that lexicon and language model are difficult to integrate into the decoding process.Compared with the traditional GMM-HMM system and hybrid DNN-HMM system,the experimental results show that the end-to-end system not only significantly reduces the recognition error rate,but also significantly improves the speech decoding speed,indicating that the proposed acoustic model can effectively enhance the model discrimination and optimize the system structure.In view of the unreasonable conditional hypotheses of hidden Markov model in speech recognition,an acoustic model based on Bidirectional Long Short-Term Memory(BLSTM)network was proposed after further studying the ability of recurrent neural network.We have successfully applied the training criterion based on connectionist temporal classification to the training of this acoustic model,and built our end-to-end Chinese speech recognition system without hidden Markov model.Meanwhile,a speech decoding method base on weighted finite-state transducer was designed to effectively solve the problem that lexicon and language model are difficult to integrate into the decoding process.Compared with the traditional GMM-HMM system and the hybrid DNN-HMM system,the experimental results show our end-to-end system significantly reduces the recognition error rate,while at the same time speeding up decoding dramatically.It is shown that the acoustic model proposed in this paper can effectively enhance the model discrimination and optimize the structure of speech recognition system.
作者 姚煜 RYAD Chellali YAO Yu;RYAD Chellali(College of Electrical Engineering and Control Science,Nanjing Tech University,Nanjing Jiangsu 211816,China)
出处 《计算机应用》 CSCD 北大核心 2018年第9期2495-2499,共5页 journal of Computer Applications
关键词 语音识别 长短时记忆神经网络 联结时序分类 加权有限状态转换器 端到端系统 speech recognition Long Short-Term Memory(LSTM)neural network Connectionist Temporal Classification(CTC) Weight Finite-State Transducer(WFST) end-to-end system
  • 相关文献

同被引文献94

引证文献16

二级引证文献93

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部