期刊文献+

利用词性信息改进Katz平滑算法

Improved Katz smoothing algorithms with POS information
下载PDF
导出
摘要 对已有的N-gram平滑算法进行了系统地分析,分别实现了Absolute、W-B和Katz平滑算法.为解决传统Katz平滑算法在处理某些汉语固定搭配时无法进行概率折扣的问题,利用词性信息构造了新的折扣系数.新的折扣系数使词频越大,折扣越小,后接词越多,折扣越大,满足平滑算法对折扣系数的期望.试验结果表明:新的Katz平滑算法降低了N-gram模型的交叉熵,在汉语分词中应用改进的平滑算法也提高了分词结果的F量度. This paper reviewed existing smoothing methods for N - gram model firstly, and implemented the Absolute, W - B and Katz smoothing algorithms respectively. Traditional Katz algorithm couldn' t discount the probability when it smoothed Chinese collocation. We constructed new discounting coefficient based on Part-of- Speech information to resolve this problem. Calculated by the new discounting coefficient, discount could decrease when word frequency increased, and the more count of following word, the more discount. All this satisfied demand of smoothing methods. Experiment result showed that improved Katz smoothing algorithm could not only decrease the cross entropy of language model, but also increase the F measure when applied to Chinese word segmentation.
出处 《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2007年第9期1445-1448,共4页 Journal of Harbin Institute of Technology
基金 国家自然科学基金重点资助项目(60435020) 国家高技术研究发展计划资助项目(2002AA117010-09)
关键词 N-GRAM模型 数据稀疏 词性信息 Katz平滑 N- gram model data sparseness POS information Katz smoothing
  • 相关文献

参考文献7

  • 1GOOD I J.The population frequencies of species and the estimation of population parameters[J].Biometrica,1953,40(3 and 4):237 -264.
  • 2JELINEK F,MERCER R L.Interpolated estimation of markov source parameters from sparse data[C]//Proceedings of the workshop on Pattern Recognition in Practice.Amsterdam,The Netherlands:[s.n.],1980:381-397.
  • 3KATZ M.Estimation of probabilities from sparse data for the language model component of a speech recognizer[J].IEEE transactions on Acoustics,Speech and signal Processing,1987,ASSP-35(3):400-401.
  • 4WITTEN I H,BELL T C.The zero-frequency problem:Estimating the probabilities of novel events in adaptive text compression[J].IEEE Transactions on Information Theroy,1991,37(4):1085-1094.
  • 5NEY H,ESSEN U,KNESER R.On structuring probabilistic dependences in stochastic language modeling[J].Computer Speech and Language,1994,8:1 -38.
  • 6SHEN S F,GOODMAN J.An empirical study smoothing techniques for language modeling[C]//Proceedings of the 34th Annual Meeting of the ACL.Caligornia:[s.n.],1996:310-318.
  • 7SPROAT R,EMERSON T.The first international chinese word segmentation bakeoff[C]//.First SIGHAN workshop attaced with the ACL2003.Sapporo,Japan:[s.n.],2003:133-143.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部