期刊文献+

大规模词表连续语音识别引擎紧致动态网络的构建 被引量:1

Construction of a compact dynamic decoder network for large vocabulary continuous speech recognition
原文传递
导出
摘要 大规模词表连续语音识别系统需要综合各种知识源,如声学模型、语言模型、发音词典等。其中,解码网络是识别引擎的基础,对提高解码器的性能有着至关重要的影响。有效综合这些知识源,构建一个紧致的解码网络,可以有效减少识别时的搜索空间和重复计算,显著提高解码速度。该文针对语音识别的动态解码网络进行研究,提出了词标志(word end,WE)节点前推算法,结合传统的前后向合并算法,实现了一个基于隐Markov模型状态为网络节点的紧凑动态解码网络。优化后的解码网络的节点数和边数分别是线性词典解码网络的1/4,是开源工具包HDecode的1/2;需要计算语言模型预测分数的节点数为HDecode的1/2。该声学模型基于三音子建模,可方便地移植到其他语种上。 Large vocabulary continuous speech recognition systems (LVCSR) involve various knowledge sources, such as an acoustic model, a language model and a pronunciation dictionary. The decoder network as the basis of the decoder has a critical influence on the decoder performance. By effectively integrating these knowledge sources, a compact decoder network can reduce the search space and avoid repeated computations, which accelerates the recognition speed. This paper describes a compact dynamic decoder network based on hidden Markov model states as the network node, with an efficient word end pushing algorithm for speech recognition. The algorithm combines traditional forward and backward combination algorithms to reduce the number of nodes and edges by a factor of 4 compared to a linear lexical decoder network and with half as many nodes as the well-known open source tool HDecode. The number of nodes needed to calculate the look-ahead score is cut in half. This acoustic model is based on three phonemes so decoder networks can easily be built for other languages.
出处 《清华大学学报(自然科学版)》 EI CAS CSCD 北大核心 2012年第11期1530-1534,共5页 Journal of Tsinghua University(Science and Technology)
基金 国家自然科学基金委员会与香港研究资助局联合科研基金资助项目(60931160443) 国家自然科学基金项目(90920302 61005019) 国家"八六三"高技术项目(2008AA040201) 国家科技支撑计划资助项目(2009BAH41B01)
关键词 语音识别 解码网络 声学模型 语言模型预测 speech recognition decoder network acoustic model language model look-ahead
  • 相关文献

参考文献14

  • 1Gales M, Young S. The application of hidden Markov models in speech recognition [J]. Foundations and Trends in Signal Processing, 2008, 1(3): 195 - 304.
  • 2Young S. A review of large-vocabulary continuous speech recognition [J]. IEEE Signal Process, 1996, 13(5) : 45 - 57.
  • 3Rybach D, Schuter R, Ney H. A comparative analysis of dynamic network decoding [C]// Proe ICASSP. Prague, Czech: IEEE Press, 2011: 5184-5187.
  • 4Soltau H, Saon G. Dynamic network decoding revisited [C]// Proc ASRU. Merano, Italy: IEEE Press, 2009:276 - 281.
  • 5Mohri M, Pereira F, Riley M. Weighted finite state transducers in speech recognition [C]// Proc the Automatic Speech Recognition Workshop. Paris, France: IEEE Press, 2000: 97-106.
  • 6Woodland C, Odell J, Valtchev V, et al. Large vocabulary continuous speech recognition using HTK [C]// Proc ICASSP. Adelaide, Australla: IEEE Press, 1994: 125-128.
  • 7Young S, Russell N, Thornton J. Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems [S]. Cambridge, UK: Cambridge University, 1989.
  • 8Young S, Evermann G, Gales M. The HTK Book, Version 3.4 [M]. Cambridge, UK: Cambridge University, 2006.
  • 9Ortmanns S, Ney H, Coenen N. Language model lookahead for large vocabulary speech recognition [C]// Proc ICSLP. Piladelphia, USA: IEEE Press, 1996:2095-2098.
  • 10Shao J, Li T, Zhang Q, et al. A one-pass real-time decoder using memory efficient state network [J]. IEICE Trans on Information and Systems, 2008, 91(3): 529 - 537.

同被引文献13

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部