期刊文献+

统计自然语言处理中的线性插值平滑技术 被引量:4

Linear Interpolated Methods in Statistical Natural Language Processing
下载PDF
导出
摘要 统计自然语言处理中,一个很复杂的问题是数据稀疏问题。主要有两种平滑方法解决:回退法和线性插值法。本文分析和比较了几种典型的线性插值方法,着重研究了它们所引发的词性聚类倾向。在此基础上,给出了2种改进的平滑方法。实验表明,改进的方法比原来的方法有更出色的平滑效果。 One of the complicated problems in statistical natural language processing is the data-sparseness problem. There are mainly two kinds of smoothing technologies to solve it, backing-off models and linear interpolated models. This article compares several typical linear interpolated methods, and focuses on studying the relationship between the smoothing parameters and the parts of speech. Besides, two improved methods are proposed. Our experiment results show that both of them outperform original ones.
出处 《计算机科学》 CSCD 北大核心 2007年第6期223-225,244,共4页 Computer Science
关键词 统计语言模型 数据稀疏问题 平滑技术 回退法 线性插值法 N-GRAM Statistical language model,Data sparse problem, Smoothing technology,Backing- off methods,Linear interpolated methods, N-gram
  • 相关文献

参考文献10

  • 1Chen S F,Goodman J.An Empirical Study of Smoothing Techniques for Language Modeling:[Technical Report TR-10-98].Computer Science Group,Harvard University,1998
  • 2Gale W A,Church K W.What's wrong with adding one? In:N.Oostdijk,P.de Haan,eds.Corpus-Based Research into Language.Rodolpi,Amsterdam,1994
  • 3Gale W A,Sampson G.Good-Turing frequency estimation without tears.In:Journal of Quantitative Linguistics,1995,2 (3):217~237
  • 4Church K W,Gale W A.A comparison of the enhanced GoodTuring and deleted estimation methods for estimating probabilities of English bigrams.Computer Speech and Language,1991,5 (1):19~54
  • 5Chen S F.Building probabilistic models for natural language.Harvard University,Cambridge,MA,1996
  • 6Jelinek F,Mercer E L.Interpolated Estimation of Markov Source Parameters from Sparse Data.In:D.Gelsema and L.Kanal,eds.Pattern Recognition in Practice.North-Holland,1980
  • 7Katz S.Estimation of probabilities from sparse data for the lan guage model component of a speech recognizer.IEEE ASSP,1997,35 (3):400~401
  • 8Kneser R,Ney H.Improved Backing-off for M-Gram Language Modeling.In:Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing,1995,1:181~184
  • 9Ney H,Essen U.On smoothing Techniques for Bigram-based Natural Language Modelling.In:Proceedings of the IEEE 1991 International Conference on Acoustic,Speech,and Signal Processing,Toranto,1991.251~258
  • 10Manning C D,Schutze H.统计自然语言处理基础.苑春法,等译.电子工业出版社,2005

同被引文献23

  • 1孙晋文,肖建国.基于SVM文本分类中的关键词学习研究[J].计算机科学,2006,33(11):182-184. 被引量:12
  • 2高友福.语音的线性预测分析原理与算法[J].长江工程职业技术学院学报,2006,23(4):54-57. 被引量:3
  • 3王卫玲,刘培玉,初建崇.一种改进的基于条件互信息的特征选择算法[J].计算机应用,2007,27(2):433-435. 被引量:23
  • 4刘华.基于关键短语的文本分类研究[J].中文信息学报,2007,21(4):34-41. 被引量:14
  • 5Joaehims T. A probabilistic analysis of the Roeehio algorithm with TFIDF for text categorization [ C ]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997 : 143-151.
  • 6Mladenic D. Machine Learning on Non-homogeneous, Distributed Text Data Mining[ D ]. Doctoral Dissertation:University of Ljubljana, 1998.
  • 7Rosenfeld R. A maximum entropy to adaptive statistical language learning[ J ]. Computer Speech and Language, 1996, 10( 3 ) : 187-228.
  • 8Yang Y,Pederson J O. A comparative study on feature selection in text categorization [ C]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997,412-420.
  • 9Woosung Kim, Sanjeev Khudanpur. Smoothing issues in the structured language model [ C]//Proc. 7th European Conf on Speech Communication and Technology. 2001:717-720.
  • 10Kneser R, Ney H. hnproved backing-off for m-gram language modeling[ C]//Proc. ICASSP'95. 1995:181-184.

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部