期刊文献+

一种基于CNN-DFSMN-CTC的语音识别模型

A Speech Recognition Model Based on CNN-DFSMN-CTC
下载PDF
导出
摘要 针对现有语音识别模块在复杂环境中识别准确率较低,训练较为复杂的问题,论文提出将深度前馈序列神经网络(Deep Feedforword Sequential Memory Networks,DFSMN)和端到端的连接时序分类(Connectionist Temporal Classification,CTC)方法相结合的方法对语音识别的声学模型进行改进;其次,针对现有的声学特征表示方法在深度神经网络中的表征能力较差的问题,论文在对数梅尔滤波组(Log Mel Filter-bank,Fbank)特征提取方法的基础上,利用卷积神经网络(Convolutional Neural Networks,CNN)对声学特征进行二次提取,解决了现有的声学特征表示方法在深度神经网络中的表征能力较差的问题。在Thchs-30数据集上,改进的CNN-DFSMN-CTC模型相对于CNN模型和LSTM型在测试集上的字错率(Character Error Rate,CER)分别相对降低了6.83%和7.96%。 Aiming at the problems of low recognition accuracy and complex training of existing speech recognition modules in complex environment,this paper proposes to improve the acoustic model of speech recognition by combining deep feedforward se-quential memory neural networks(DFSMN)and end-to-end connectist temporal classification(CTC).Secondly,in view of the poor representation ability of the existing acoustic feature representation methods in the deep neural network,based on the log Mel filter bank(Fbank)feature extraction method,this paper uses the convolutional neural networks(CNN)to extract the acoustic fea-tures twice,which solves the problem of the poor representation ability of the existing acoustic feature representation methods in the deep neural network.On the thchs-30 data set,the character error rate(CER)of the improved cnn-dfsmn-ctc model on the test set is reduced by 6.83%and 7.96%respectively compared with CNN model and LSTM model.
作者 梁宏涛 刘家旭 LIANG Hongtao;LIU Jiaxu(College of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061)
出处 《计算机与数字工程》 2024年第10期2984-2990,共7页 Computer & Digital Engineering
关键词 语音识别 DFSMN CTC CNN speech recognition DFSMN CTC CNN
  • 相关文献

参考文献10

二级参考文献53

共引文献110

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部