期刊文献+

双向循环神经网络在语音识别中的应用 被引量:6

Application of Bidirectional Recurrent Neural Network in Speech Recognition
下载PDF
导出
摘要 针对前馈神经网络难以处理时序数据的问题,提出将双向循环神经网络(BiRNN)应用在自动语音识别声学建模中。首先,应用梅尔频率倒谱系数进行特征提取;其次,采用双向循环神经网络作为声学模型;最后,测试不同参数对系统性能的影响。在TIMIT数据集上的实验结果表明,与基于卷积神经网络和深度神经网络的声学模型相比,识别率分别提升了1.3%和4.0%,说明基于双向循环神经网络的声学模型具有更好的性能。 In order to solve the problem that feed-forward neural network is difficult to process time series data, bidirectional recurrent neural network(BiRNN) is applied in acoustic modeling of automatic speech recognition. Firstly, the Mel frequency cepstrum coefficients are used for feature extraction. Secondly, bidirectional recurrent neural network is used as acoustic model. And finally, the effects of different parameters on system performance are tested. Experimental results on TIMIT dataset show that, compared with convolutional neural network and deep neural network, the recognition rate of the proposed system is improved by 1.3% and 4.0% respectively, which indicates that BiRNN is more suitable for automatic speech recognition.
作者 更藏措毛 黄鹤鸣 Gengzang-Cuomao;HUANG He-ming(School of Computer Science,Qinghai Normal University,Xining 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Xining 810008,China)
出处 《计算机与现代化》 2019年第10期1-6,共6页 Computer and Modernization
基金 青海省自然科学基金资助项目(2016-ZJ-904) 国家自然科学基金资助项目(61662062,61462072)
关键词 双向循环神经网络 语音识别 梅尔频率倒谱系数 深度神经网络 bidirectional recurrent neural network speech recognition Mel frequency cepstrum coefficient deep neural network
  • 相关文献

参考文献3

二级参考文献36

  • 1Gong Y. Speech recognition in noisy environments: A sur- vey. Speech Communication, 1995; 16:261--291.
  • 2Huang X, Hon H W. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, 2001.
  • 3Moreno P. Speech recognition in noisy environments. Ph.D. thesis, Carnegie Mellon University, 1996.
  • 4Gales M J F. The generation and use of regression class trees for MLLR adaptation. Cambridge University, Tech. Rep. CUED/FINFENG/TR263, 1996.
  • 5Varga A, Moore R. Hidden Markov model decomposition of speech and noise. ICASSP, 1990; 2:845--848.
  • 6Ghitza O. Temporal non-plaze information in the auditory- nerve firing patterns as a front-end for speech recognition in a noisy environment. Journal of Phonetics, 1988; 16: 109--123.
  • 7Gajic B, Paliwal K K. Robust speech recognition in noisy environments based on subband spectral centroid his- tograms. IEEE Trans. Audio, Speech, and Language Pro- cessing, 2006; 14:600----608.
  • 8De La Torre Aet al. Non-linear transformations of the feature space for robust speech recognition. ICASSP, 2006: 401--404.
  • 9Du J, Wang R H. Cepstral shape normalization (CSN) for robust speech recognition. ICASSP, 2008:4389--4392.
  • 10Honig F et al. Revising perceptual linear prediction (PLP). Eurospeech, 2005:2997--3000.

共引文献36

同被引文献31

引证文献6

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部