期刊文献+

基于加权有限状态机的动态匹配词图生成算法 被引量:4

Exact Word Lattice Generation in Weighted Finite State Transducer Framework
下载PDF
导出
摘要 由于现有的加权有限状态机(WFST)解码网络没有精确词尾标记,导致当前已有的词图生成算法不含精确的词尾时间点,或者仅是状态、音素级别的词图,无法应用到关键词检索中。该文提出在WFST静态解码器下的语音识别词图生成算法。首先从理论上分析了WFST解码音素图和词图的可转换关系,然后提出了字典的动态音素匹配方法解决了WFST网络中词尾时间点对齐的问题,最后通过令牌传递的遍历方法生成了词图。同时,考虑到计算量优化,在令牌传递过程中引入了剪枝算法,使音素图转词图的耗时不到解码耗时的3%。得到的词图,不仅可以用于语言模型重打分,由于含有精确的词尾时间点,还可以直接应用到关键词检索系统中。实验结果表明,该文的词图生成算法具有较高的计算效率;和已有动态解码器的词图相比,词图中包含更多解码信息,在大词汇连续语音识别的重打分结果和关键词检索中都能取得更好的性能。 The existing lattice generation algorithms have no exact word end time because the Weighted Finite State Transducer (WFST) decoding networks have no word end node. An algorithm is proposed to generate the standard speech recognition lattice within the WFST decoding framework. The lattices which have no exact word end time can not be used in the keyword spotting system. In this paper, the transformation relationship between WFST phone lattices and standard word lattice is firstly studied. Afterward, a dynamic lexicon matching method is proposed to get back the word end time. Finally, a token passing method is proposed to transform the phone lattices into standard word lattices. A prune strategy is also proposed to accelerate the token passing process, which decreases the transforming time to less than 3% additional computation time above one-pass decoding. The lattices generated by the proposed algorithm can be used in not only the language model rescoring but also the keyword spotting systems. The experimental results show that the proposed algorithm is efficient for practical application and the lattices generated by the proposed algorithm have more information than the lattices generated by the comparative dynamic decoder. This algorithm has a good performance in language model rescoring and keyword spotting.
出处 《电子与信息学报》 EI CSCD 北大核心 2014年第1期140-146,共7页 Journal of Electronics & Information Technology
基金 国家自然科学基金(10925419 90920302 61072124 11074275 11161140319 91120001 61271426) 中国科学院战略性先导科技专项(XDA06030100 XDA06030500) 国家863计划项目(2012AA012503) 中科院重点部署项目(KGZD-EW-103-2)资助课题
关键词 自动语音识别 加权有限状态机 词图生成 关键词检索 Automatic speech recognition Weighted Finite State Transducer (WFST) Lattice generation Keyword spotting
  • 相关文献

参考文献16

  • 1Shore T, Faubel F, Hehnke H, et al.. Knowledge-based word lattice rescoring in a dynamic context[C]. Proceedings of Interspeech, Portland, 2012:1337- 1340.
  • 2Zhang Hao and Gildea D. Efficient multipass decoding for synchronous context free grammars[C]. Proceedings of the Association for Computational Linguistics, Columbus, 2008: 209-217.
  • 3Mangu L, Brill E, and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks[J]. Computer Speech & Language, 2000, 14(4): 373-400.
  • 4Ortmanns S, Ney H, and Aubert X. A word graph algorithm for large vocabulary continuous speech recognition[J]. Computer Speech & Language, 1997, 11(1): 43-72.
  • 5Demuynck K, Duchateau J, Compernolle D V, et al.. An efficient search space representation for large vocabulary continuous speech recognition[J]. Speech Communication, 2000, 30(1): 37-53.
  • 6Rybach D, Schluter R, and Ney H. A comparative anMysis of dynamic network decoding[C]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, 2011:5184 -5187.
  • 7Mohri M, Pereira F C N, and Riley M. Speech Recognition with Weighted Finite-State Transducers[M]. Handbook of Speech Processing, Verlag Berlin Heidelberg, Springer, 2008: 559-582.
  • 8Ljolje A, Pereira F, and Riley M. Efficient general lattice generation and rescoring[C]. Proceedings of 6th European Conference on Speech Communication and Technology, Budapest, 1999:1251- 1254.
  • 9Povey D, Hannemann M, Boulianne G, et al.. Generating exact lattices in the WFST framework[C]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4213-4216.
  • 10Povey D, Ghoshal A, Boulianne G, et al.. The Kaldi speech recognition toolkit[C]. Proceedings of Automatic Speech Recognition and Understanding Workshop, Hawaii, 2011: 10.1109/ASRU.2011.6163923.

二级参考文献7

  • 1Williams G and Renals S.Confidence measures for hybrid HMM/ANN speech recognition.Proceedings of Eurospeech-97,Rhodes,Greece,1997:1955-1958.
  • 2Sankar A and Wu Su-Lin.Utterance verification based on statistics of phone-level confidence scores.Proceedings of IEEE ICASSP-2003,Hong Kong,2003:584-587.
  • 3Guo Gang,Huang Chao,Jiang Hui,and Wang Renhua.A comparative study on various confidence measures in large vocabulary speech recognition,ISCSLP 2004,Hong Kong,2004:9-12.
  • 4Rivlin Z,Cohen M,Abrash V,and Chung T.A phone dependent confidence measure for utterance rejection.Proceedings IEEE International Conference on Acoustics Speech and Signal Processing,Atlanta,USA,1996:515-517.
  • 5Kamppari S O and Hazen T J.Word and phone level acoustic confidence scoreing,Proceedings of IEEE ICASSP-2000,Istanbul,Turkey,2000:1799-1802.
  • 6Evermann G.Minimum word error rate decoding.[MPhil thesis],Cambridge University,1999.
  • 7Abdou S and Scordilis M S.Beam search pruning in speech recognition using a posterior probability-based confidence measure.Speech Communication,2004:409-428.

共引文献2

同被引文献52

  • 1王水平,唐振民,陈北京,蒋晔.复杂环境下语音增强的复平面谱减法[J].南京理工大学学报,2013,37(6):857-862. 被引量:6
  • 2王恺,靳简明,王庆人.西文OCR后处理中的有限自动机模型[J].计算机工程与应用,2004,40(23):26-29. 被引量:2
  • 3温锐,朱巧明,李培峰.HMM和负反馈模型在词性标注中的应用[J].苏州大学学报(自然科学版),2005,21(3):39-42. 被引量:5
  • 4姚全珠,张杰.基于数据挖掘的搜索引擎技术[J].计算机应用研究,2006,23(11):29-30. 被引量:7
  • 5龙翀,庄丽,朱小燕,黄开竹,孙俊,堀田悦伸,直井聡.手写中文地址识别后处理方法的研究[J].中文信息学报,2006,20(6):69-74. 被引量:6
  • 6Mohamed A, Dahl G E, Hinton G. Acoustic modeling u- sing deep belief networks [ J ]. IEEE Transactions on Au- dio,Speech, and Language Processing,2012,20 ( 1 ) : 14 - 22.
  • 7Deng L, Platt J C. Ensemble deep learning for speech recognition[ C]//Proceedings of the Annual Conference of International Speech Communication Association (INTER- SPEECH). Washington DC :IEEE,2014 : 1915 - 1919.
  • 8Dahl G E, Yu D, Deng L, et al. Context-dependent pre- trained deep neural networks for large-vocabulary speech recognition[ J]. IEEE Transactions on Audio, Speech, and Language Processing,2012,20( 1 ) :30 -42.
  • 9Du J, Dai L R, Huo Q. Synthesized stereo mapping via deep neural networks for noisy speech recognition [ C ]// 2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Washington DC : IEEE, 2014 : 1764 - 1768.
  • 10Lee H,Hu T,Jing H,et al. Ensemble of machine learning and acoustic segment model techniques for speech emo- tion and autism spectrum disorders recognition[ C]//Pro-ceedings of the Annual Conference of International Speech Communication Association ( INTERSPEECH ). Washington DC : IEEE ,2013:215 - 219.

引证文献4

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部