摘要
针对卷积神经网络在连续语音识别中识别性能较差的问题,提出多尺度残差深度卷积神经网络的语音识别的算法,并结合联结时序分类算法,构建端到端中文语音识别系统。将多尺度学习和残差机制以及空洞卷积引入到神经网络中,摆脱序列建模对长短时记忆神经网络的依赖,提高模型的训练速度,增强语音识别的抗噪声干扰性。实验表明,与双向长短时记忆模型(BLSTM)、深度卷积神经网络模型(DCNN)和卷积神经网络-长短时记忆模型(CNN-LSTM)相比,该模型的字错误率WER(Word Error Rate)分别降低了9%、5%和3%左右,且在噪声环境下的识别率也优于传统的语音识别系统。
To solve the problem of poor performance of convolutional neural networks in continuous speech recognition,this paper proposes an algorithm based on a multi-scale residual deep convolutional neural network,and constructs an end-to-end speech recognition system for Chinese,by integrating connectionist temporal classification into the algorithm.The multi-scale learning,residual mechanism,and dilated convolution were introduced into the neural network to eliminate the dependence of sequence modeling on LSTM,improve the training speed of the model,and enhance the anti-noise interference of speech recognition.Experiments show that compared with BLSTM,DCNN and CNN-LSTM,the WER of this model is reduced by 9%,5%and 3%respectively,and the recognition rate in noisy environment is better than that in traditional speech recognition system.
作者
刘虹
袁三男
Liu Hong;Yuan Sannan(School of Electronics and Information Engineering,Shanghai University of Electric Power,Shanghai 200090,China)
出处
《计算机应用与软件》
北大核心
2020年第11期275-279,共5页
Computer Applications and Software
关键词
语音识别
多尺度
卷积神经网络
端到端
Speech recognition
Multi-scale
Convolutional neural network
End-to-end