大规模词表连续语音识别引擎紧致动态网络的构建被引量：1

Construction of a compact dynamic decoder network for large vocabulary continuous speech recognition

导出

摘要大规模词表连续语音识别系统需要综合各种知识源,如声学模型、语言模型、发音词典等。其中,解码网络是识别引擎的基础,对提高解码器的性能有着至关重要的影响。有效综合这些知识源,构建一个紧致的解码网络,可以有效减少识别时的搜索空间和重复计算,显著提高解码速度。该文针对语音识别的动态解码网络进行研究,提出了词标志(word end,WE)节点前推算法,结合传统的前后向合并算法,实现了一个基于隐Markov模型状态为网络节点的紧凑动态解码网络。优化后的解码网络的节点数和边数分别是线性词典解码网络的1/4,是开源工具包HDecode的1/2;需要计算语言模型预测分数的节点数为HDecode的1/2。该声学模型基于三音子建模,可方便地移植到其他语种上。 Large vocabulary continuous speech recognition systems （LVCSR） involve various knowledge sources, such as an acoustic model, a language model and a pronunciation dictionary. The decoder network as the basis of the decoder has a critical influence on the decoder performance. By effectively integrating these knowledge sources, a compact decoder network can reduce the search space and avoid repeated computations, which accelerates the recognition speed. This paper describes a compact dynamic decoder network based on hidden Markov model states as the network node, with an efficient word end pushing algorithm for speech recognition. The algorithm combines traditional forward and backward combination algorithms to reduce the number of nodes and edges by a factor of 4 compared to a linear lexical decoder network and with half as many nodes as the well-known open source tool HDecode. The number of nodes needed to calculate the look-ahead score is cut in half. This acoustic model is based on three phonemes so decoder networks can easily be built for other languages.

作者刘加陈谐单煜翔史永哲

机构地区清华大学电子工程系

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2012年第11期1530-1534,共5页 Journal of Tsinghua University(Science and Technology)

基金国家自然科学基金委员会与香港研究资助局联合科研基金资助项目(60931160443) 国家自然科学基金项目(90920302 61005019) 国家"八六三"高技术项目(2008AA040201) 国家科技支撑计划资助项目(2009BAH41B01)

关键词语音识别解码网络声学模型语言模型预测 speech recognition decoder network acoustic model language model look-ahead

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献14

1Gales M, Young S. The application of hidden Markov models in speech recognition [J]. Foundations and Trends in Signal Processing, 2008, 1(3): 195 - 304.
2Young S. A review of large-vocabulary continuous speech recognition [J]. IEEE Signal Process, 1996, 13(5) : 45 - 57.
3Rybach D, Schuter R, Ney H. A comparative analysis of dynamic network decoding [C]// Proe ICASSP. Prague, Czech: IEEE Press, 2011: 5184-5187.
4Soltau H, Saon G. Dynamic network decoding revisited [C]// Proc ASRU. Merano, Italy: IEEE Press, 2009:276 - 281.
5Mohri M, Pereira F, Riley M. Weighted finite state transducers in speech recognition [C]// Proc the Automatic Speech Recognition Workshop. Paris, France: IEEE Press, 2000: 97-106.
6Woodland C, Odell J, Valtchev V, et al. Large vocabulary continuous speech recognition using HTK [C]// Proc ICASSP. Adelaide, Australla: IEEE Press, 1994: 125-128.
7Young S, Russell N, Thornton J. Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems [S]. Cambridge, UK: Cambridge University, 1989.
8Young S, Evermann G, Gales M. The HTK Book, Version 3.4 [M]. Cambridge, UK: Cambridge University, 2006.
9Ortmanns S, Ney H, Coenen N. Language model lookahead for large vocabulary speech recognition [C]// Proc ICSLP. Piladelphia, USA: IEEE Press, 1996:2095-2098.
10Shao J, Li T, Zhang Q, et al. A one-pass real-time decoder using memory efficient state network [J]. IEICE Trans on Information and Systems, 2008, 91(3): 529 - 537.

同被引文献13

1李业良,张二华,唐振民.基于混合式注意力机制的语音识别研究[J].计算机应用研究,2020,37(1):131-134. 被引量：9
2倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展[J].中文信息学报,2009,23(1):112-123. 被引量：39
3刘加.汉语大词汇量连续语音识别系统研究进展[J].电子学报,2000,28(1):85-91. 被引量：50
4王子龙,李俊峰,张劭韡,王宏岩,王思杰.基于递归神经网络的端到端语音识别[J].计算机与数字工程,2019,47(12):3099-3106. 被引量：4
5刘娟宏,胡彧,黄鹤宇.端到端的深度卷积神经网络语音识别[J].计算机应用与软件,2020,37(4):192-196. 被引量：30
6唐海桃,薛嘉宾,韩纪庆.一种多尺度前向注意力模型的语音识别方法[J].电子学报,2020,48(7):1255-1260. 被引量：18
7杨威,胡燕.混合CTC/attention架构端到端带口音普通话识别[J].计算机应用研究,2021,38(3):755-759. 被引量：11
8郭家兴,韩纪庆.一种RNN-T与BERT相结合的端到端语音识别模型[J].智能计算机与应用,2021,11(2):169-173. 被引量：2
9刘晓峰,宋文爱,陈小东,郇晋侠,李志媛.基于多核卷积融合网络的BLSTM-CTC语音识别[J].计算机应用与软件,2021,38(11):167-173. 被引量：10
10姚潇,史叶伟,霍冠英,徐宁.基于神经网络结构搜索的轻量化网络构建[J].模式识别与人工智能,2021,34(11):1038-1048. 被引量：3

引证文献1

1刘育坤,郑霖,黎塔,张鹏远.多声学场景下端到端语音识别声学编码器的自适应[J].声学学报,2023,48(6):1260-1268. 被引量：1

二级引证文献1

1王雪宝,汤永涛,王青波,唐文龙.人工智能语音识别技术在国外军事领域的应用分析[J].电脑知识与技术,2024,20(5):21-23.

1马国胜,严晓兰.多相位改进型R-2R梯形DAC电路[J].电气电子教学学报,2001,23(2):48-49.
2阿呆.神州数码:有效应对企业网络资源浪费[J].通讯世界,2012(11):59-59.
3打电话学数字[J].孩子（幼儿版）,2009(3):27-27.
4Max Baron.前座驾驶员——采用34.8GOPS图像识别引擎的汽车导航芯片[J].电子产品世界,2008,15(3). 被引量：1
5林生佑,金一庆.连续语音识别的线性词典动态规划研究[J].计算机应用研究,2001,18(1):27-29.
6胡来招.数字瞬时测频——驻波推算法测频[J].电子对抗,2005(5):42-45.
7王振宇.多级并接型权电阻DAC的设计[J].电气电子教学学报,2002,24(6):43-45. 被引量：1
8高清伦,谭月辉.语音识别技术在军用话务台中的应用模拟系统研究[J].河北工业科技,2007,24(5):272-274. 被引量：2
9胡来招.数字瞬时测频——波形推算法测频[J].电子对抗,2005(6):44-46.
10Altera发布业界第一个面向FPGA的OpenCL计划[J].中国集成电路,2011,20(12):8-8.

清华大学学报（自然科学版）

2012年第11期

浏览历史

内容加载中请稍等...

大规模词表连续语音识别引擎紧致动态网络的构建被引量：1

参考文献14

同被引文献13

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

大规模词表连续语音识别引擎紧致动态网络的构建 被引量：1

参考文献14

同被引文献13

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

大规模词表连续语音识别引擎紧致动态网络的构建被引量：1