期刊文献+

基于循环神经网络语言模型的N-best重打分算法 被引量:3

N-best Rescoring Algorithm Based on Recurrent Neural Network Language Model
下载PDF
导出
摘要 循环神经网络语言模型能够克服统计语言模型中存在的数据稀疏问题,同时具有更强的长距离约束能力,是一种重要的语言模型建模方法。但在语音解码时,由于该模型使词图的扩展次数过多,造成搜索空间过大而难以使用。本文提出了一种基于循环神经网络语言模型的N-best重打分算法,利用N-best引入循环神经网络语言模型概率得分,对识别结果进行重排序,并引入缓存模型对解码过程进行优化,得到最优的识别结果。实验结果表明,本文方法能够有效降低语音识别系统的词错误率。 Recurrent neural network language model (RNNLM ) is an important method in statistical lan‐guage models because it can tackle the data sparseness problem and contain a longer distance constraints . However ,it lacks practicability because the lattice has to expand too many times and explode the search space .Therefore ,a N‐best rescoring algorithm is proposed which uses the RNNLM to rerank the recog‐nition results and optimize the decoding process .Experimental results show that the proposed method can effectively reduce the word error rate of the speech recognition system .
出处 《数据采集与处理》 CSCD 北大核心 2016年第2期347-354,共8页 Journal of Data Acquisition and Processing
基金 国家自然科学基金(61175017)资助项目 国家高技术研究发展计划("八六三"计划)(2012AA011603)资助项目 全军军事学研究生课题(2010JY0258-144)资助项目
关键词 语音识别 语言模型 循环神经网络 N-best重打分 缓存语言模型 speech recognition language model recurrent neural network N-best rescoring cache lan-guage model
  • 相关文献

参考文献13

  • 1荣传振,岳振军,贾永兴,王渊,杨宇.唇语识别关键技术研究进展[J].数据采集与处理,2012,27(S2):277-283. 被引量:4
  • 2Rosenfeild R. Two decades of statistical language modeling: Where do we go from here? [J]. Proceedings of the IEEE, 2000, 88(8): 1270-1278.
  • 3Sundermeyer M, Sehluter R, Ney H. On the estimation of discount parameters for language model smoothing [C]// The 12th Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011:1433-1436.
  • 4Deoras A, Mikolov T, Kombrink S, et al. Variational approximation of long-span language models for LVCSR [C]// IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic: IEEE, 2011 : 5532-5535.
  • 5Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model [J]. Journal of Machine Learning Research, 2003, 3(2): 1137-1155.
  • 6Mikolov T, Karafiat M, Burget L, et al. Recurrent neural network based language model [C]//The llth International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010: 1045-1048.
  • 7Sundermeyer M, Oparin I, Gauvain J L, et al. Comparison of feedforward and recurrent neural network language models [C]// IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013.- 8430- 8434.
  • 8Schwenk H. Continuous space language models [J]. Computer Speech and Language, 2007, 21(3): 492-518.
  • 9Ire H S, 0parin 1, Allauzen A, et al. Structured output layer neural network language models for speech recognition [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(1): 195-204.
  • 10Mikolov T, Kombrink S, Burger I., et al. Extensions of recurrent neural network language model [C]// IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic: IEEE, 2011: 5528-5531.

二级参考文献45

  • 1张建明,陶宏,王良民,詹永照,宋顺林.基于SVD的唇动视觉语音特征提取技术[J].江苏大学学报(自然科学版),2004,25(5):426-429. 被引量:3
  • 2Murphy K. Dynamic Bayesian networks:representation,inference and learning[D]. Berkeley: University of California, 2002.
  • 3Bilmes J, Zweig G. The graphical modelds toolkit: an open source software system for speech and timeseries processing[C]//Proceedings of the IEEE International Conf on Acoustic Speech and Signal Processing (ICASSP). OrLando, Florida, USA:[s. n.], 2002(4): 3916-3919.
  • 4Bilmes J, Bartels C. Graphical model architectures for speech recognition [J]. IEEE Signal Processing Magazine, 2005, 22(5): 89-100.
  • 5Zweig G. Speech recognition with dynamic Bayesian networks [D]. Berkeley: University of California, 1998.
  • 6Bilmes J, Zweig G, Richardson T, et al. Discriminatively structured graphical models for speech recognition: JHU-WS-2001 final workshop report [EB/OL]. http://www, clsp. jhu. edu/ws2001/ groups/gmsr/GMRO-final-rpt, pdf, Johns Hopkins Univ, Baltimore, MD, Tech Rep CLSP, 2001.
  • 7Lv Guoyun, Jiang Dongmei, Sahli H, et al. A novel DBN model for large vocabulary continuous speech recognition and phone segmentation [C]//International Conference on Artificial Intelligence and Pattern Recognition (AIPR-07). Orlando, USA.. [s. n.] 2007, 1:397-402.
  • 8Young S J, Odell J, Woodland P C. Tree-based state tying for high accuracy acoustic modeling [C]//Proceedings ARPA Workshop on Human Language Technology. Plainsboro, NJ, USA: [s. n. ].1994: 307-312.
  • 9Bilmes J. GMTK: the graphical models toolkit[EB/ OL]. http://ssli, ee. washington, edu/-bilmes/ gmtk/, 2002.
  • 10Alan L. Yuille,Peter W. Hallinan,David S. Cohen.Feature extraction from faces using deformable templates[J]. International Journal of Computer Vision . 1992 (2)

共引文献5

同被引文献18

引证文献3

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部