期刊文献+

电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究 被引量:8

Revaluation based on LSTM- DNN language model in telephone conversation speech recognition
下载PDF
导出
摘要 近年来,神经网络语言模型的研究越来越受到学术界的广泛关注。基于长短期记忆(long short-term memory,LSTM)结构的深度神经网络(LSTM-deep neural network,LSTM-DNN)语言模型成为当前的研究热点。在电话交谈语音识别系统中,语料本身具有一定的上下文相关性,而传统的语言模型对历史信息记忆能力有限,无法充分学习语料的相关性。针对这一问题,基于LSTM-DNN语言模型在充分学习电话交谈语料相关性的基础上,将其应用于语音识别系统的重评估过程,并将这一方法与基于高元语言模型、前向神经网络(feed forward neural network,FFNN)以及递归神经网络(recurrent neural network,RNN)语言模型的重评估方法进行对比。实验结果表明,LSTMDNN语言模型在重评估方法中具有最优性能,与一遍解码结果相比,在中文测试集上字错误率平均下降4.1%。 In recent years,the research on the neural network language model has received more and more attention from the academic circles. At present,the neural network language model based on LSTM structure has become a research hotspot. In the speech recognition system,the corpus itself has certain relevance. But the traditional language models have limited memory capacity,and they cannot fully learn the relevance of the corpus. To solve this problem,a novel LSTM-DNN language model is applied to the revaluation of speech recognition,which fully exploits the correlation on a telephone conversation corpus. It is further compared with existing revaluation methods based on language models such as high order language model,feed forward neural network( FFNN) language model and recurrent neural network( RNN) language model. The experimental results show that the performance of LSTM-DNN language model is optimal. Compared to the first pass of decoding,the relative decline in average word error rate is 4. 1% in the Chinese test sets.
出处 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2016年第2期180-186,193,共8页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金 国家自然科学基金(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划(2012AA012503) 中科院重点部署项目(KGZD-EW-103-2)~~
关键词 长短期记忆 神经网络语言模型 语音识别 重评估 long short-term memory neural network language model speech recognition revaluation
  • 相关文献

参考文献19

  • 1STOLCKEA, SRILM. An extensible language modelingtoolkit [ C ]//INTERSPEECH 2010,7th InternationalConference on Spoken Language Processing. Denver Col-orado :The International Speech Communication Associa-tion, 2002:901-904.
  • 2HEAFIELD K,KENLM : Faster and smaller languagemodel queries [ C ] //Proceedings of the Sixth Workshopon Statistical Machine Translation. Portland : The Associ-ation for Computational Linguistics,2011 : 187-197.
  • 3HEAFIELDK,POUZYREVSKY I,CLARK J H,et al.Scalable Modified Kneser-Ney Language Model Estimation[C ] //Proceedings of the 51st Annual Meeting of the As-sociation for Computational Linguistics. Sofia Bulgaria:The Association for Computational Linguistics, 2013 :690-696.
  • 4MIKOLOV T. Statistical language models based on neuralnetworks[D]. Prague :Bmo University of Technology ,2012.
  • 5PAPPAS N,MEYER T. A Survey on Language Modelingusing Neural Networks, No. EPFL-REPORT-192566[R]. Martigny : Idiap, 2012.
  • 6BENGIO Y, DUCHARME R,VINCENT P, et al. Aneural probabilistic language model [ J]. The Journal ofMachine Learning Research, 2003( 13) : 1137-1155.
  • 7SCHWENK H. Continuous space language models [ J].Computer Speech & Language, 2007,21(3) : 492-518.
  • 8MIKOLOV T,KARAFIaT M, BURGET L, et al. Recur-rent neural network based language model[ C]//INTER-SPEECH 2010,11th Annual Conference of the Interna-tional Speech Communication Association. MakuhariChi-ba : The International Speech Communication Association,2010: 1045-1048.
  • 9MIKOLOV T, KOMBRINK S,DEORAS A, et al.RNNLM-Recurrent neural network language modelingtoolkit [ C ]// Proceeding of the 2011 ASRU Workshop.Waikoloa,Hawaii : Institute of Electrical and ElectronicEngineers,2011 : 196-201.
  • 10MIKOLOV T, KOMBRINK S, BURGET L, et al. Exten-sions of recurrent neural network language model [ C ]//ICASSP 2011,Proceedings of the IEEE International Con-ference on Acoustics, Speech, and Signal Processing.Prague Congress Center, Prague : Institute of Electricaland Electronic Engineers,2011 : 5528-5531.

二级参考文献9

  • 1黄昌宁.统计语言模型能做什么?[J].语言文字应用,2002(1):77-84. 被引量:31
  • 2863评测网站[EB].http://www.863data.org.cn.英文版:http://www.863data.org.cn/english.
  • 3NIST语音类评测网站[EB].http://www.nist.gov/speech/tests/index.htm.
  • 4NIST机器翻译评测网站[EB].http://www.nist.gov/speech/tests/mt/index.htm.
  • 5TREC网站[EB].http://trec.nist.gov/.
  • 6CLEF评测网站[EB].http://www.clef-campaign.org/.
  • 7NTCIR评测网站[EB].http://research.nii.ac.jp/ntcir/workshop/.
  • 8MUC7 [EB]: http://www. itl. nist. gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc. html.
  • 9SIGHAN网站[EB].http://www.sighan.org/.

共引文献3

同被引文献72

  • 1Barry S J,Dane A D. Morice A H,et al. The automatic rec- ognition and counting of cough [J]. Cough (London, Eng- land), 2006,2 (9) : 8-15.
  • 2Chen J,Kam A H,Zhang J,et al. Bathroom activity moni- toring based on sound[C]//Proceeding s of the third inter- national conference on pervasive computing. Munich, Ger- many.- Springer-Verlag Berlin Heidelberg, 2005 : 47-61.
  • 3Diment A, Heittola T, Yirtanen T. Sound event detection for office live and office synthetic aasp challenge[J]. IEEE AASP Challenge on Detection and Classification of Acous- tic Scenes and Events, Technical Report, Tampere Univer- sity of Technology,2013(3):23-30.
  • 4Giannoulis D, Benetos E, Stowell D, et al. IEEE AASP chal- lenge on detection and classification of acoustic scenes and events-development dataset for event detection task, sub- task 1-0 L[J]. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), 2012, 10 (4) : 1-4.
  • 5Roma G, Nogueira W, Herrera P. Recurrence quantification analysis features for environmental sound recognition[J]. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), 2013 : 1-4.
  • 6Zhang J, Yang C, Zeng J M, et al. Improved EMD method based on genetic algorithm and support vector machine[J]. Journal of Chongqing University of Technology: Natural Science,2015 (11):101-105.
  • 7Wang H Q, Wang B. Application of optimized proximal support vector machine in image retrieval[J]. Journal of Chongqing University of Technology: Natural Science,2014 (9) :66-71.
  • 8Zhou Z, Yang Z X. The concave-convex procedure of the twin support vector machine[J]. Journal of Chongqing Uni- versity of Technology : Natural Science, 2014 (10) : 90-95.
  • 9Kucukbay S E,Sert M. Audio-based event detection in of- fice live environments using optimized MFCC-SVM ap- proach[C]//IEEE transactions on semantic computing. Anaheim, CA : IEEE, 2015 (2) : 475-480.
  • 10Yang Y, Nie F,Xu D,et al. A multimedia retrieval frame- work based on semi-supervised ranking and relevance feedback[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34 (4) : 723-742.

引证文献8

二级引证文献56

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部