期刊文献+

特定领域的汉语语言模型平滑算法比较研究 被引量:5

Comparative Study on Smoothing Algorithms for Domain-Specific Chinese Language Models
下载PDF
导出
摘要 为了完成特定领域的语音识别任务,利用有限的语料建立高性能的语言模型成为提高系统性能的关键。针对此问题,对特定领域的语言模型进行了研究。提出了利用高频新词来加强模型的领域特征的方法,采取了两种方案:一种是将高频新词直接加入原有字典,并在训练过程中增加这些新词的权重,使模型更能表达与领域相关的特征;一种是基于高频新词统计出一个和领域相关的小词表,并对这两种方案进行了比较研究。通过实验研究了适合汉语语言的平滑策略。最后,实验结果表明,对于特定领域问题,语言模型平滑算法对模型性能影响较大;采用适合汉语的Witten-Bell插值平滑,可以使识别率达到88.4%,比通用模型性能相对提高了18.18%。 It is important to build a powerful language model by using limited corpora in the field of speech recognition for a specific domain.To deal with this problem,two methods concerning how to process new words with high frequencies in a specific domain are presented.One way is to add the new words to the dictionary directly and then give them a high weight in the procedure of training.The other is to work out a new dictionary according to the new words. And based on some comparative experiments,these two methods and various smoothing algorithms are studied in detail. At last,it can be concluded that the performance of language model is affected by the smoothing algorithm greatly,and the Witten-Bell interpolation method could improve the recognition rate to 88.4%,which is 18.18% higher than the general language model.
出处 《计算机工程与应用》 CSCD 北大核心 2006年第32期14-16,共3页 Computer Engineering and Applications
基金 国家自然科学基金资助项目(编号:60535030)。
关键词 语言模型 特定领域 语音识别 平滑 字典 language model,specific domain,speech recognition,smoothing algorithm,dictionary
  • 相关文献

参考文献5

  • 1ROSENFELD R.Two decades of statistical language modeling:where do we go from here?[C]//Proceedings of the IEEE,2000,88:1270~1278.
  • 2Rosenfeld R.A maximum entropy approach to adaptive statistical language modeling[J].Computer Speech and Language,1996,10:187~228.
  • 3CHEN S F,GOODMAN J.An empirical study of smoothing techniques for language modeling[J].Computer speech and language,1999,13:359~394.
  • 4Kneser R,Ney H.Improved backing-off for m-gram language modeling[C]//Proceedings of the IEEE International Conference on Acoustics:Speech and Signal Processing,1995:181~184.
  • 5CHEN S,ROSENFELD R.A survey of smoothing techniques for ME models[J].IEEE Trans Speech and Audio Processing,2000,8:37~50.

同被引文献30

  • 1许永林,史晓东,蔡骏.利用FP-树构造多词Trigger对语言模型[J].厦门大学学报(自然科学版),2005,44(B06):243-246. 被引量:2
  • 2黄永文,何中市.基于互信息的统计语言模型平滑技术[J].中文信息学报,2005,19(4):46-51. 被引量:8
  • 3LIDSTONE G J. Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities [ J ]. Transactions of the Faculty of Actuaries, 1920, 8 : 182-192.
  • 4KATZ S M. Estimation of probabilities from sparse data for the language model component of a speech recognizer [ J ]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, 35(3) : 400-401.
  • 5GOODMAN J, CHEN S F. An empirical study of smoothing techniques for language modeling [ J ]. Computer Speech and Language, 1999, 13(4):359-393.
  • 6CHURCH K W, GALE W A. A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of English bigrams [ J ]. Computer Speech and Language, 1991,5(1):19-54.
  • 7JELINEK F, MERCER R L. Interpolated estimation of Markov source parameters from sparse data[ C]// Proceedings of the Workshop on Pattern Recognition in Practice. Amsterdam, 1980: 381-397.
  • 8傅祖芸.信息论基础理论与应用[M].北京:电子工业出版社,2005.
  • 9Good I J. The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika, 1953, 40 (3/4) : 237 - 264.
  • 10Katz S M. Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Trans on Acoustics, Speech, and Signal Processing, 1987, 35 ( 3 ) : 400 - 401.

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部