摘要
针对前馈神经网络难以处理时序数据的问题,提出将双向循环神经网络(BiRNN)应用在自动语音识别声学建模中。首先,应用梅尔频率倒谱系数进行特征提取;其次,采用双向循环神经网络作为声学模型;最后,测试不同参数对系统性能的影响。在TIMIT数据集上的实验结果表明,与基于卷积神经网络和深度神经网络的声学模型相比,识别率分别提升了1.3%和4.0%,说明基于双向循环神经网络的声学模型具有更好的性能。
In order to solve the problem that feed-forward neural network is difficult to process time series data, bidirectional recurrent neural network(BiRNN) is applied in acoustic modeling of automatic speech recognition. Firstly, the Mel frequency cepstrum coefficients are used for feature extraction. Secondly, bidirectional recurrent neural network is used as acoustic model. And finally, the effects of different parameters on system performance are tested. Experimental results on TIMIT dataset show that, compared with convolutional neural network and deep neural network, the recognition rate of the proposed system is improved by 1.3% and 4.0% respectively, which indicates that BiRNN is more suitable for automatic speech recognition.
作者
更藏措毛
黄鹤鸣
Gengzang-Cuomao;HUANG He-ming(School of Computer Science,Qinghai Normal University,Xining 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining 810008,China)
出处
《计算机与现代化》
2019年第10期1-6,共6页
Computer and Modernization
基金
青海省自然科学基金资助项目(2016-ZJ-904)
国家自然科学基金资助项目(61662062,61462072)
关键词
双向循环神经网络
语音识别
梅尔频率倒谱系数
深度神经网络
bidirectional recurrent neural network
speech recognition
Mel frequency cepstrum coefficient
deep neural network