一种联合语种识别的新型大词汇量连续语音识别算法被引量：10

A Novel Large Vocabulary Continuous Speech Recognition Algorithm Combined with Language Recognition

下载PDF

导出

摘要提出了一种联合语种识别的新型大词汇量连续语音识别(Large vocabulary continuo us speech recognition,LVCSR)算法,并构建了实时处理系统.该算法能够充分利用语音解码过程中收集的音素识别假设,在识别语音内容的同时识别语种类别.该系统可以应用于多语种环境,不仅可以以更小的系统整体计算开销替代独立的语种识别模块,更能有效应对在同一段语音中混有非目标语种的情况,极大地减少由非目标语种引入的无意义识别错误,避免错误积累对后续识别过程的误导.为将语音内容识别和语种识别紧密整合在一个统一语音识别解码过程中,本文提出了三种不同的算法对解码产生的音素格结构进行调整(重构):一方面去除语音识别中由发音字典和语言模型引入的特定目标语种偏置,另一方面在音素格中包含更加丰富的音素识别假设.实验证明,音素格重构算法可有效提高联合识别中语种识别的精度.在汉语为目标语种、汉英混杂的电话对话语音库上测试表明,本文提出的联合识别算法将集外语种引起的无意义识别错误减少了91.76%,纯汉字识别错误率为54.98%. In this paper, a novel large vocabulary continuous speech recognition （LVCSR） algorithm combined with language recognition is proposed, and a real-time processing system is developed. This algorithm can make full use of phonetic hypotheses collected during decoding, and identify language types simultaneously. In a multilingual environment, this algorithm can not only take the place of a standalone language recognizer at a lower system overall computational cost, but also effectively cope with the case where target and non-target languages mix in a single utterance. It can significantly reduce speech recognition error introduced by non-target language, and avoid error accumulation which may mislead the subsequent decoding procedure. In order to tightly combine the content and language recognition into a unified decoding procedure, three different phone lattice reconstruction algorithms are also proposed to eliminate pronunciation and grammar restrictions introduced by the target language＇s dictionary and language model of the LVCSR decoder, and to encode lattices with richer phonetic information. Experiments show that the lattice reconstruction algorithms can significantly improve language recognition accuracy in the combined recognition. Evaluated on a Mandarin/English mixed conversational telephone speech corpus where Mandarin is the target language, the proposed algorithms reduced the recognition error introduced by non-target language by 91.76 %, and achieved a character error rate of 54.98 %.

作者单煜翔邓妍刘加

机构地区清华大学电子工程系清华信息科学与技术国家实验室

出处《自动化学报》 EI CSCD 北大核心 2012年第3期366-374,共9页 Acta Automatica Sinica

基金高技术研究发展计划(国家863计划)(2008AA02Z414 2008AA040201) 国家自然科学基金(60776800 61005019) 国家自然科学基金委员会与香港研究资助局联合科研基金(60931160443)资助~~

关键词语音识别语种识别集外语种问题音素格重构 Speech recognition, language recognition, out-of-language problem, phone lattice reconstruction

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献12

1Lim D C Y,Lane I.Language identification for speech-to-speech translation.In:Proceedings of the10th Annual Con-ference of the International Speech Communication Associ-ation.Brighton,UK:ISCA,2009.204-207.
2Motlicek P.Automatic out-of-language detection based on confidence measures derived from LVCSR word and phone lattices.In:Proceedings of the10th Annual Conference of the International Speech Communication Association.Brighton,UK:ISCA,2009.1215-1218.
3Motlicek P,Valente F.Application of out-of-language detec-tion to spoken term detection.In:Proceedings of the IEEEInternational Conference on Acoustics,Speech and Signal Processing.Dallas,USA:IEEE,2010.5098-5101.
4Motlicek P,Valente F,Garner P N.English spoken term detection in multilingual recordings.In:Proceedings of the11th Annual Conference of the International Speech Communication Association.Chiba,Japan:ISCA,2010.206-209.
5Li H Z,Ma B,Lee C H.A vector space modeling approach to spoken language identification.IEEE Transactions on Au-dio,Speech and Language Processing,2007,15(1):271-284.
6Gauvain J L,Messaoudi A,Schwenk H.Language recogni-tion using phone lattices.In:Proceedings of the8th Inter-national Conference on Spoken Language Processing.Jeju Island,Korea:ISCA,2004.1283-1286.
7Zissman M A.Comparison of four approaches to automatic language identification of telephone speech.IEEE Transac-tions on Speech and Audio Processing,1996,4(1):31-44.
8Torres-Carrasquillo P A.Language Identification Using Gaussian Mixture Models[Ph.D.dissertation],Michigan State University,USA,2002.
9Mangu L,Brill E,Stolcke A.Finding consensus in speech recognition:word error minimization and other applica-tion of confusion network.Computer Speech and Language,2000,14(4):373-400.
10Campbell W M,Campbell J P,Reynolds D A,Jones D A,Leek T R.Phonetic speaker recognition with support vector machines.In:Proceedings of the Neural Information Processing Systems.Vancouver,Canada:MIT Press,2003.1377-1384.

同被引文献121

1林源.现代汉字独体与合体划分探究[J].伊犁教育学院学报,2006,19(4):109-111. 被引量：2
2刘海涛.关于自然语言计算机处理的几点思考[J].术语标准化与信息技术,2001(1):23-27. 被引量：2
3范婷,刘宏.电视背景环境下语音命令识别系统[J].华中科技大学学报（自然科学版）,2011,39(S2):312-315. 被引量：1
4许菊芳.理解,交流的基础——俞士汶教授谈自然语言处理技术[J].微电脑世界,1999,0(31):11-13. 被引量：1
5樊红,冯恩德.一种基于证据理论的船舶综合安全评估(FSA)方法[J].武汉理工大学学报（交通科学与工程版）,2004,28(4):546-549. 被引量：30
6缪成,袁保社,吾守尔.斯拉木,李莉.维、哈、柯、汉、英多文种处理平台的设计与实现[J].计算机工程,2004,30(10):71-73. 被引量：20
7吕学强,郭军,姚天顺.英汉机器翻译系统ECT中的知识库[J].小型微型计算机系统,2004,25(8):1482-1485. 被引量：3
8冯冲,陈肇雄,黄河燕.语言工程的软件体系结构研究综述[J].中文信息学报,2004,18(6):53-60. 被引量：1
9陈燕敏,王晓龙,刘远超,楼喜中.一种基于文章主题和内容的自动摘要方法[J].计算机工程与应用,2004,40(33):11-14. 被引量：12
10石跃祥,蔡自兴.图像语义的模型结构描述[J].计算机工程与应用,2004,40(20):44-46. 被引量：6

引证文献10

1李翠霞.现代计算机智能识别技术处理自然语言研究的应用与进展[J].科学技术与工程,2012,20(36):9912-9918. 被引量：10
2田莎莎,唐菀,佘纬.改进MFCC参数在非特定人语音识别中的研究[J].科技通报,2013,29(3):139-142. 被引量：15
3何侃,田亚清,李强,胡洲荣,张静.基于LD3320的语音识别智能垃圾桶设计[J].国外电子测量技术,2015,34(6):85-88. 被引量：52
4王江南,张福转,孔庆波,张原,谷树山.用于模拟飞行指挥系统的语音识别模块[J].兵工自动化,2015,34(12):29-32. 被引量：3
5刘用功,陈丹涌.LMS算法在船舶航行危险评估中的应用[J].山东交通学院学报,2016,24(2):71-74. 被引量：1
6郝洺,徐博,殷绪成,王方圆.基于n-gram频率的语种识别改进方法[J].自动化学报,2018,44(3):453-460. 被引量：5
7SUI Peng.Research on Interactive English Speech Recognition Algorithm in Multimedia Cooperative Teaching[J].International English Education Research,2018(1):79-82.
8李云红,王成,王延年.基于混合DBNN-BLSTM模型的大词汇量连续语音识别[J].纺织高校基础科学学报,2018,31(1):103-107. 被引量：9
9刁彤.一种广播电视音频语种识别方法[J].电声技术,2021,45(5):18-20.
10胡春虹.人工智能技术在广播电视综合监管平台中的应用[J].数字通信世界,2021(12):26-27. 被引量：8

二级引证文献103

1吴立波,胡相彬.一种“足不出户”输送垃圾机器人的研制[J].邯郸职业技术学院学报,2022,35(3):40-43.
2龙涛元,庄武良,钟志德.微小型飞行器的语音控制系统研究[J].电子世界,2020,0(4):36-37.
3师配远.网络技术在现代广播电视工程中的融合与应用[J].电声技术,2023,47(2):1-3. 被引量：3
4黑永先.优化课堂教学节奏刍议[J].中学语文教学参考（教师版）,2000(5):29-31.
5侯亚君.计算机语言识别技术应用的探究[J].电脑开发与应用,2014,27(3):75-78. 被引量：2
6赵蓉英,曾宪琴,陈必坤.全文本引文分析——引文分析的新发展[J].图书情报工作,2014,58(9):129-135. 被引量：56
7杜文龙.一种提高语音特征参数稳健性MLMCC算法的研究[J].智能计算机与应用,2014,4(4):94-96.
8张建英,刘学航,冯翔.园林生态古镇遥感图像特征信息灰阶量化分析[J].科技通报,2014,30(8):212-214. 被引量：1
9华斌,张丽超,赵富强.基于加权MFCC的音频检索[J].计算机工程与应用,2015,51(8):200-204. 被引量：8
10陈恬.基于上下位机结构的智能型密集柜的设计与实现[J].电子测试,2015,26(10):1-3.

1倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展[J].中文信息学报,2009,23(1):112-123. 被引量：39
2李艳玲,郑淑荣.Java多线程技术浅析[J].现代计算机,2007,13(12):82-84. 被引量：3
3张兵星.探讨计算机软件测试的相关技术应用[J].中国管理信息化,2016,19(4):169-169. 被引量：11
4在移动中实现产品管理[J].中国机电工业,2012(11):110-110.
5罗越.VB程序设计中控制结构的应用分析[J].计算机光盘软件与应用,2010(10):81-82.
6刘颖,姜永涛.面向对象软件测试技术与方法的管理[J].信息技术,2005,29(6):33-35. 被引量：3
7李小平.Foxpro结构化程序设计中的几种编程技巧[J].电脑开发与应用,2004,17(6):44-45.
8实用经验技巧[J].新电脑,2007,31(2):142-153.
9杨凤芹,孙吉贵,张长胜,张长海.大词汇量连续语音识别中搜索空间的表示及相关搜索方法的研究进展[J].计算机科学,2008,35(2):191-195. 被引量：2
10张卫强,宋贝利,蔡猛,刘加.基于音素后验概率的样例语音关键词检测方法[J].天津大学学报（自然科学与工程技术版）,2015,48(9):757-760. 被引量：3

自动化学报

2012年第3期

浏览历史

内容加载中请稍等...

一种联合语种识别的新型大词汇量连续语音识别算法被引量：10

参考文献12

同被引文献121

引证文献10

二级引证文献103

相关作者

相关机构

相关主题

浏览历史

一种联合语种识别的新型大词汇量连续语音识别算法 被引量：10

参考文献12

同被引文献121

引证文献10

二级引证文献103

相关作者

相关机构

相关主题

浏览历史

一种联合语种识别的新型大词汇量连续语音识别算法被引量：10