电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究被引量：8

Revaluation based on LSTM- DNN language model in telephone conversation speech recognition

下载PDF

导出

摘要近年来,神经网络语言模型的研究越来越受到学术界的广泛关注。基于长短期记忆(long short-term memory,LSTM)结构的深度神经网络(LSTM-deep neural network,LSTM-DNN)语言模型成为当前的研究热点。在电话交谈语音识别系统中,语料本身具有一定的上下文相关性,而传统的语言模型对历史信息记忆能力有限,无法充分学习语料的相关性。针对这一问题,基于LSTM-DNN语言模型在充分学习电话交谈语料相关性的基础上,将其应用于语音识别系统的重评估过程,并将这一方法与基于高元语言模型、前向神经网络(feed forward neural network,FFNN)以及递归神经网络(recurrent neural network,RNN)语言模型的重评估方法进行对比。实验结果表明,LSTMDNN语言模型在重评估方法中具有最优性能,与一遍解码结果相比,在中文测试集上字错误率平均下降4.1%。 In recent years,the research on the neural network language model has received more and more attention from the academic circles. At present,the neural network language model based on LSTM structure has become a research hotspot. In the speech recognition system,the corpus itself has certain relevance. But the traditional language models have limited memory capacity,and they cannot fully learn the relevance of the corpus. To solve this problem,a novel LSTM-DNN language model is applied to the revaluation of speech recognition,which fully exploits the correlation on a telephone conversation corpus. It is further compared with existing revaluation methods based on language models such as high order language model,feed forward neural network（ FFNN） language model and recurrent neural network（ RNN） language model. The experimental results show that the performance of LSTM-DNN language model is optimal. Compared to the first pass of decoding,the relative decline in average word error rate is 4. 1% in the Chinese test sets.

作者左玲云张晴晴黎塔梁宏颜永红

机构地区中国科学院声学研究所语言声学与内容理解重点实验室

出处《重庆邮电大学学报（自然科学版）》 CSCD 北大核心 2016年第2期180-186,193,共8页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基金国家自然科学基金(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划(2012AA012503) 中科院重点部署项目(KGZD-EW-103-2)~~

关键词长短期记忆神经网络语言模型语音识别重评估 long short-term memory neural network language model speech recognition revaluation

分类号 TN911 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献19

1STOLCKEA, SRILM. An extensible language modelingtoolkit [ C ]//INTERSPEECH 2010,7th InternationalConference on Spoken Language Processing. Denver Col-orado :The International Speech Communication Associa-tion, 2002:901-904.
2HEAFIELD K,KENLM : Faster and smaller languagemodel queries [ C ] //Proceedings of the Sixth Workshopon Statistical Machine Translation. Portland : The Associ-ation for Computational Linguistics,2011 : 187-197.
3HEAFIELDK,POUZYREVSKY I,CLARK J H,et al.Scalable Modified Kneser-Ney Language Model Estimation[C ] //Proceedings of the 51st Annual Meeting of the As-sociation for Computational Linguistics. Sofia Bulgaria:The Association for Computational Linguistics, 2013 :690-696.
4MIKOLOV T. Statistical language models based on neuralnetworks[D]. Prague :Bmo University of Technology ,2012.
5PAPPAS N,MEYER T. A Survey on Language Modelingusing Neural Networks, No. EPFL-REPORT-192566[R]. Martigny : Idiap, 2012.
6BENGIO Y, DUCHARME R,VINCENT P, et al. Aneural probabilistic language model [ J]. The Journal ofMachine Learning Research, 2003( 13) : 1137-1155.
7SCHWENK H. Continuous space language models [ J].Computer Speech & Language, 2007,21(3) : 492-518.
8MIKOLOV T,KARAFIaT M, BURGET L, et al. Recur-rent neural network based language model[ C]//INTER-SPEECH 2010,11th Annual Conference of the Interna-tional Speech Communication Association. MakuhariChi-ba : The International Speech Communication Association,2010: 1045-1048.
9MIKOLOV T, KOMBRINK S,DEORAS A, et al.RNNLM-Recurrent neural network language modelingtoolkit [ C ]// Proceeding of the 2011 ASRU Workshop.Waikoloa,Hawaii : Institute of Electrical and ElectronicEngineers,2011 : 196-201.
10MIKOLOV T, KOMBRINK S, BURGET L, et al. Exten-sions of recurrent neural network language model [ C ]//ICASSP 2011,Proceedings of the IEEE International Con-ference on Acoustics, Speech, and Signal Processing.Prague Congress Center, Prague : Institute of Electricaland Electronic Engineers,2011 : 5528-5531.

二级参考文献9

1黄昌宁.统计语言模型能做什么?[J].语言文字应用,2002(1):77-84. 被引量：31
2863评测网站[EB].http://www．863data．org．cn.英文版：http://www．863data．org．cn／english．
3NIST语音类评测网站[EB].http://www．nist．gov／speech/tests／index．htm．
4NIST机器翻译评测网站[EB].http://www．nist．gov／speech／tests／mt／index．htm．
5TREC网站[EB].http://trec．nist．gov／．
6CLEF评测网站[EB].http://www．clef-campaign．org/．
7NTCIR评测网站[EB].http://research．nii．ac．jp／ntcir／workshop／．
8MUC7 [EB]: http://www. itl. nist. gov/iaui/894.02/related_projects/muc/proceedings/muc_7_toc. html.
9SIGHAN网站[EB].http://www．sighan．org／．

共引文献3

1倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展[J].中文信息学报,2009,23(1):112-123. 被引量：38
2张晴晴,潘接林,颜永红.基于发音特征的汉语普通话语音声学建模[J].声学学报,2010,35(2):254-260. 被引量：14
3朱琦,王敬.中文信息处理评测资源建设现状与问题研究[J].曲靖师范学院学报,2016,35(2):72-77. 被引量：1

同被引文献72

1Barry S J,Dane A D. Morice A H,et al. The automatic rec- ognition and counting of cough [J]. Cough (London, Eng- land), 2006,2 (9) : 8-15.
2Chen J,Kam A H,Zhang J,et al. Bathroom activity moni- toring based on sound[C]//Proceeding s of the third inter- national conference on pervasive computing. Munich, Ger- many.- Springer-Verlag Berlin Heidelberg, 2005 : 47-61.
3Diment A, Heittola T, Yirtanen T. Sound event detection for office live and office synthetic aasp challenge[J]. IEEE AASP Challenge on Detection and Classification of Acous- tic Scenes and Events, Technical Report, Tampere Univer- sity of Technology,2013(3):23-30.
4Giannoulis D, Benetos E, Stowell D, et al. IEEE AASP chal- lenge on detection and classification of acoustic scenes and events-development dataset for event detection task, sub- task 1-0 L[J]. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), 2012, 10 (4) : 1-4.
5Roma G, Nogueira W, Herrera P. Recurrence quantification analysis features for environmental sound recognition[J]. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA), 2013 : 1-4.
6Zhang J, Yang C, Zeng J M, et al. Improved EMD method based on genetic algorithm and support vector machine[J]. Journal of Chongqing University of Technology: Natural Science,2015 (11):101-105.
7Wang H Q, Wang B. Application of optimized proximal support vector machine in image retrieval[J]. Journal of Chongqing University of Technology: Natural Science,2014 (9) :66-71.
8Zhou Z, Yang Z X. The concave-convex procedure of the twin support vector machine[J]. Journal of Chongqing Uni- versity of Technology : Natural Science, 2014 (10) : 90-95.
9Kucukbay S E,Sert M. Audio-based event detection in of- fice live environments using optimized MFCC-SVM ap- proach[C]//IEEE transactions on semantic computing. Anaheim, CA : IEEE, 2015 (2) : 475-480.
10Yang Y, Nie F,Xu D,et al. A multimedia retrieval frame- work based on semi-supervised ranking and relevance feedback[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34 (4) : 723-742.

引证文献8

1李玲俐.家庭保健监测系统中环境声音事件的识别[J].重庆师范大学学报（自然科学版）,2016,33(4):118-122. 被引量：1
2刘畅,张一珂,张鹏远,颜永红.基于改进主题分布特征的神经网络语言模型[J].电子与信息学报,2018,40(1):219-225. 被引量：10
3余昉恒,沈海斌.一种跨层连接的循环神经网络门结构设计[J].传感器与微系统,2018,37(8):91-93. 被引量：3
4李云红,王成,王延年.基于混合DBNN-BLSTM模型的大词汇量连续语音识别[J].纺织高校基础科学学报,2018,31(1):103-107. 被引量：9
5张佳宁,严冬梅,王勇.基于word2vec的语音识别后文本纠错[J].计算机工程与设计,2020,41(11):3235-3240. 被引量：17
6苏智韬.基于改进RNN的LSTM软件缺陷预测技术的研究[J].现代信息科技,2020,4(21):17-19. 被引量：1
7刘怡然,王东杰,邓雪峰,刘振宇.基于长短时记忆神经网络的生猪价格预测模型[J].江苏大学学报（自然科学版）,2021,42(2):190-197. 被引量：11
8张文栋,刘子琨,梁涛,刘伟.基于CNN-LSTM的综合能源系统负荷预测模型[J].重庆邮电大学学报（自然科学版）,2023,35(2):254-262. 被引量：5

二级引证文献56

1杨素珍.自贡市一对山乳业总公司实行产业化经营初见成效[J].四川奶牛,2000(1):1-1.
2徐萍,吴超,胡峰俊,吴凡,林建伟,刘静静.基于迁移学习的个性化循环神经网络语言模型[J].南京理工大学学报,2018,42(4):401-408. 被引量：6
3汤鲲,陈思思.基于GRU+LDA的群聊主题挖掘[J].计算机与现代化,2018(12):72-76. 被引量：1
4郑晓琼,汪晓,江海升,樊培培,张超.基于RNN和WFST译码的自动语音识别研究[J].信息技术,2019,43(6):115-120. 被引量：3
5钟琪,冯亚琴,王蔚.跨语言语料库的语音情感识别对比研究[J].南京大学学报（自然科学版）,2019,55(5):765-773. 被引量：3
6南措吉,才让卓玛,都格草.基于BLSTM和CTC的藏语语音识别[J].青海师范大学学报（自然科学版）,2019,35(4):26-33. 被引量：2
7杨志杰,张梅,李冠龙,黄昌达.基于长短时记忆元的语音智能识别系统设计[J].电子设计工程,2020,28(1):55-58. 被引量：3
8彭玉青,乔颖,陶慧芳,刘宪姿,刘元剑.融入注意力机制的PM2.5预测模型[J].传感器与微系统,2020,39(7):44-47. 被引量：9
9仝梦园,金守峰,陈阳,李毅,尹加杰.改进卷积神经网络的手写试卷分数识别方法[J].西安工程大学学报,2020,34(4):80-85. 被引量：11
10张晶晶,刘烨.基于在线评论和改进LDA模型的新闻传播推荐技术研究[J].现代电子技术,2020,43(19):115-117. 被引量：2

1Wang Yue,Wang Xiaojie,Mao Yuzhao.First-Feed LSTM model for video description[J].The Journal of China Universities of Posts and Telecommunications,2016,23(3):89-93. 被引量：1
2莎燕,罗山.单光子的产生、传输、检测及量子通讯[J].激光与光电子学进展,1996,33(1):5-9. 被引量：1
3移动设备新时代[J].科技创业,2011(6):79-79. 被引量：3
4今年的亮点产品[J].新电脑,2015,0(4):28-33.
5郭雪颖.汉语电话交谈中的身份识别研究[J].太原城市职业技术学院学报,2011(3):201-202.
6韦晓东,胡光锐.ENDPOINT DETECTOR OF NOISY SPEECH SIGNAL USING A RECURRENT NEURAL NETWORK[J].Journal of Shanghai Jiaotong university(Science),1999,4(1):60-63.
7姚永成.远距离电话交谈发射机的制作[J].家电检修技术（资料版）,2012(9):18-20.
8R&S移动终端PoC测试方案[J].中国无线电,2006(5):67-67.
9汪晓东,赵鹏程,王骥程.基于前向神经网络的逐次逼近式非线性A/D转换器[J].电子测量与仪器学报,1996,10(4):30-34.
10基于R＆S CRTU-ATE应用测试软件平台开发PoC测试方案把手机当对讲机用：R＆S移动终端PoC测试方案[J].通信世界,2006(19B):23-23.

重庆邮电大学学报（自然科学版）

2016年第2期

浏览历史

内容加载中请稍等...

电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究被引量：8

参考文献19

二级参考文献9

共引文献3

同被引文献72

引证文献8

二级引证文献56

相关作者

相关机构

相关主题

浏览历史

电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究 被引量：8

参考文献19

二级参考文献9

共引文献3

同被引文献72

引证文献8

二级引证文献56

相关作者

相关机构

相关主题

浏览历史

电话交谈语音识别中基于LSTM-DNN语言模型的重评估方法研究被引量：8