基于加权有限状态机的动态匹配词图生成算法被引量：4

Exact Word Lattice Generation in Weighted Finite State Transducer Framework

下载PDF

导出

摘要由于现有的加权有限状态机(WFST)解码网络没有精确词尾标记,导致当前已有的词图生成算法不含精确的词尾时间点,或者仅是状态、音素级别的词图,无法应用到关键词检索中。该文提出在WFST静态解码器下的语音识别词图生成算法。首先从理论上分析了WFST解码音素图和词图的可转换关系,然后提出了字典的动态音素匹配方法解决了WFST网络中词尾时间点对齐的问题,最后通过令牌传递的遍历方法生成了词图。同时,考虑到计算量优化,在令牌传递过程中引入了剪枝算法,使音素图转词图的耗时不到解码耗时的3%。得到的词图,不仅可以用于语言模型重打分,由于含有精确的词尾时间点,还可以直接应用到关键词检索系统中。实验结果表明,该文的词图生成算法具有较高的计算效率;和已有动态解码器的词图相比,词图中包含更多解码信息,在大词汇连续语音识别的重打分结果和关键词检索中都能取得更好的性能。 The existing lattice generation algorithms have no exact word end time because the Weighted Finite State Transducer （WFST） decoding networks have no word end node. An algorithm is proposed to generate the standard speech recognition lattice within the WFST decoding framework. The lattices which have no exact word end time can not be used in the keyword spotting system. In this paper, the transformation relationship between WFST phone lattices and standard word lattice is firstly studied. Afterward, a dynamic lexicon matching method is proposed to get back the word end time. Finally, a token passing method is proposed to transform the phone lattices into standard word lattices. A prune strategy is also proposed to accelerate the token passing process, which decreases the transforming time to less than 3% additional computation time above one-pass decoding. The lattices generated by the proposed algorithm can be used in not only the language model rescoring but also the keyword spotting systems. The experimental results show that the proposed algorithm is efficient for practical application and the lattices generated by the proposed algorithm have more information than the lattices generated by the comparative dynamic decoder. This algorithm has a good performance in language model rescoring and keyword spotting.

作者郭宇弘黎塔肖业鸣潘接林颜永红

机构地区中国科学院语言声学与内容理解重点实验室

出处《电子与信息学报》 EI CSCD 北大核心 2014年第1期140-146,共7页 Journal of Electronics & Information Technology

基金国家自然科学基金(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划项目(2012AA012503) 中科院重点部署项目(KGZD-EW-103-2)资助课题

关键词自动语音识别加权有限状态机词图生成关键词检索 Automatic speech recognition Weighted Finite State Transducer （WFST） Lattice generation Keyword spotting

分类号 TP391.42 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Shore T, Faubel F, Hehnke H, et al.. Knowledge-based word lattice rescoring in a dynamic context[C]. Proceedings of Interspeech, Portland, 2012:1337- 1340.
2Zhang Hao and Gildea D. Efficient multipass decoding for synchronous context free grammars[C]. Proceedings of the Association for Computational Linguistics, Columbus, 2008: 209-217.
3Mangu L, Brill E, and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks[J]. Computer Speech & Language, 2000, 14(4): 373-400.
4Ortmanns S, Ney H, and Aubert X. A word graph algorithm for large vocabulary continuous speech recognition[J]. Computer Speech & Language, 1997, 11(1): 43-72.
5Demuynck K, Duchateau J, Compernolle D V, et al.. An efficient search space representation for large vocabulary continuous speech recognition[J]. Speech Communication, 2000, 30(1): 37-53.
6Rybach D, Schluter R, and Ney H. A comparative anMysis of dynamic network decoding[C]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011:5184 -5187.
7Mohri M, Pereira F C N, and Riley M. Speech Recognition with Weighted Finite-State Transducers[M]. Handbook of Speech Processing, Verlag Berlin Heidelberg, Springer, 2008: 559-582.
8Ljolje A, Pereira F, and Riley M. Efficient general lattice generation and rescoring[C]. Proceedings of 6th European Conference on Speech Communication and Technology, Budapest, 1999:1251- 1254.
9Povey D, Hannemann M, Boulianne G, et al.. Generating exact lattices in the WFST framework[C]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4213-4216.
10Povey D, Ghoshal A, Boulianne G, et al.. The Kaldi speech recognition toolkit[C]. Proceedings of Automatic Speech Recognition and Understanding Workshop, Hawaii, 2011: 10.1109/ASRU.2011.6163923.

二级参考文献7

1Williams G and Renals S.Confidence measures for hybrid HMM/ANN speech recognition.Proceedings of Eurospeech-97,Rhodes,Greece,1997:1955-1958.
2Sankar A and Wu Su-Lin.Utterance verification based on statistics of phone-level confidence scores.Proceedings of IEEE ICASSP-2003,Hong Kong,2003:584-587.
3Guo Gang,Huang Chao,Jiang Hui,and Wang Renhua.A comparative study on various confidence measures in large vocabulary speech recognition,ISCSLP 2004,Hong Kong,2004:9-12.
4Rivlin Z,Cohen M,Abrash V,and Chung T.A phone dependent confidence measure for utterance rejection.Proceedings IEEE International Conference on Acoustics Speech and Signal Processing,Atlanta,USA,1996:515-517.
5Kamppari S O and Hazen T J.Word and phone level acoustic confidence scoreing,Proceedings of IEEE ICASSP-2000,Istanbul,Turkey,2000:1799-1802.
6Evermann G.Minimum word error rate decoding.[MPhil thesis],Cambridge University,1999.
7Abdou S and Scordilis M S.Beam search pruning in speech recognition using a posterior probability-based confidence measure.Speech Communication,2004:409-428.

共引文献2

1刘晓明,冯晓荣,班超帆.基于动态点阵匹配算法的二阶关键词识别[J].吉林大学学报（工学版）,2012,42(3):771-775. 被引量：1
2熊于菽,冉晟伊,冯晓荣.一种改进的二阶M-KWS关键词识别方法[J].科技通报,2012,28(4):57-59. 被引量：1

同被引文献52

1王水平,唐振民,陈北京,蒋晔.复杂环境下语音增强的复平面谱减法[J].南京理工大学学报,2013,37(6):857-862. 被引量：6
2王恺,靳简明,王庆人.西文OCR后处理中的有限自动机模型[J].计算机工程与应用,2004,40(23):26-29. 被引量：2
3温锐,朱巧明,李培峰.HMM和负反馈模型在词性标注中的应用[J].苏州大学学报（自然科学版）,2005,21(3):39-42. 被引量：5
4姚全珠,张杰.基于数据挖掘的搜索引擎技术[J].计算机应用研究,2006,23(11):29-30. 被引量：7
5龙翀,庄丽,朱小燕,黄开竹,孙俊,堀田悦伸,直井聡.手写中文地址识别后处理方法的研究[J].中文信息学报,2006,20(6):69-74. 被引量：6
6Mohamed A, Dahl G E, Hinton G. Acoustic modeling u- sing deep belief networks [ J ]. IEEE Transactions on Au- dio,Speech, and Language Processing,2012,20 ( 1 ) : 14 - 22.
7Deng L, Platt J C. Ensemble deep learning for speech recognition[ C]//Proceedings of the Annual Conference of International Speech Communication Association (INTER- SPEECH). Washington DC :IEEE,2014 : 1915 - 1919.
8Dahl G E, Yu D, Deng L, et al. Context-dependent pre- trained deep neural networks for large-vocabulary speech recognition[ J]. IEEE Transactions on Audio, Speech, and Language Processing,2012,20( 1 ) :30 -42.
9Du J, Dai L R, Huo Q. Synthesized stereo mapping via deep neural networks for noisy speech recognition [ C ]// 2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Washington DC : IEEE, 2014 : 1764 - 1768.
10Lee H,Hu T,Jing H,et al. Ensemble of machine learning and acoustic segment model techniques for speech emo- tion and autism spectrum disorders recognition[ C]//Pro-ceedings of the Annual Conference of International Speech Communication Association ( INTERSPEECH ). Washington DC : IEEE ,2013:215 - 219.

引证文献4

1陈梦喆,张晴晴,潘接林,颜永红.语音识别中深度神经网络目标值优化[J].四川大学学报（工程科学版）,2016,48(1):166-172. 被引量：4
2刘立辉,杨毅,王旭阳,徐磊.机载任务系统语音交互技术应用研究[J].电子科技,2017,30(12):125-129. 被引量：5
3黄明,林家骏,方楠.基于加权有限状态机的电话号码规范解析[J].计算机应用与软件,2016,33(6):76-78.
4肖文磊,邹捷,冯江伟,赵罡.基于贝叶斯纠错的AR辅助飞机装配数据纠错方法[J].航空制造技术,2020,63(6):14-22. 被引量：3

二级引证文献12

1赵鹏,王斐,刘慧婷,姚晟.基于深度学习的手绘草图识别[J].四川大学学报（工程科学版）,2016,48(3):94-99. 被引量：26
2黄丽霞,王亚楠,张雪英,王洪翠.基于深度自编码网络语音识别噪声鲁棒性研究[J].计算机工程与应用,2017,53(13):49-54. 被引量：9
3孙林檀,唐博麟,田举,李子乾.基于语音识别的智能故障报修系统的研究与应用[J].电子科学技术,2017,4(5):73-76.
4崔娟,吴磊.基于人工智能深度学习的语音识别方法分析[J].信息记录材料,2019,20(9):168-169. 被引量：8
5陈高华,闫献国,郭宏,姚永超.压电陶瓷传感器的灵敏度温漂误差补偿研究[J].传感技术学报,2020,33(3):397-403. 被引量：6
6刘诺石,邹方,何昭岩,穆欣伟.一种智能防错的辅助人工作业系统开发与应用[J].航空制造技术,2021,64(5):89-97. 被引量：5
7黄小奇,范晟,陈光文,许卓伟,彭锴,方志丹,王烁.基于Viterbi解码技术的智能语音交互算法研究[J].电子设计工程,2021,29(10):37-41. 被引量：5
8陈弓.基于贝叶斯纠错的矿山地质勘测数据纠错方法[J].现代信息科技,2021,5(7):85-87. 被引量：1
9李辉,景浩,严康华,徐良浩.基于卷积循环网络与非局部模块的语音增强方法[J].电子科技,2022,35(3):8-15. 被引量：4
10田胜,赵立营,李维.基于多模态的雷达显控VR人机交互技术研究[J].现代雷达,2022,44(7):56-60. 被引量：8

1李伟,吴及,吕萍.基于前后向语言模型的语音识别词图生成算法[J].计算机应用,2010,30(10):2563-2566. 被引量：2
2优必选与亚马逊合作推出人形机器人Lynx[J].智能机器人,2017,0(1):17-17.
3陆国丽,王小华,王荣波.最大词重降维算法与模拟退火算法相结合的文本聚类方法研究[J].现代图书情报技术,2008(12):43-47. 被引量：2
4桑农,张涛,李斌,吴翔.基于字典学习的背景建模[J].华中科技大学学报（自然科学版）,2013,41(9):28-31. 被引量：2
5赵林.华为Voice Internet业务——带给您全新的感受[J].电信技术,2003(1):86-86.
6俞铁城.适用于自动语音识别的声道参数[J].物理,1998,27(2):125-125.
7朱斌,刘惠芳.一种新型的D／A转换器[J].华南理工大学学报（自然科学版）,1994,22(5):86-92. 被引量：1
8庞亭亭,廖建新,朱晓民,吕文锋.VoiceXML语音平台性能指标研究[J].计算机系统应用,2007,16(8):20-22.
9杨林国.词类扩充方法在语音识别中的应用[J].电子技术应用,2014,40(6):123-125. 被引量：3
10孙倩华,满庆丰,夏继强.FPGA和ARM的Profibus-DP主站通信平台设计[J].单片机与嵌入式系统应用,2010,10(2):65-67. 被引量：7

电子与信息学报

2014年第1期

浏览历史

内容加载中请稍等...

基于加权有限状态机的动态匹配词图生成算法被引量：4

参考文献16

二级参考文献7

共引文献2

同被引文献52

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于加权有限状态机的动态匹配词图生成算法 被引量：4

参考文献16

二级参考文献7

共引文献2

同被引文献52

引证文献4

二级引证文献12

相关作者

相关机构

相关主题

浏览历史

基于加权有限状态机的动态匹配词图生成算法被引量：4