摘要
统计自然语言处理中,一个很复杂的问题是数据稀疏问题。主要有两种平滑方法解决:回退法和线性插值法。本文分析和比较了几种典型的线性插值方法,着重研究了它们所引发的词性聚类倾向。在此基础上,给出了2种改进的平滑方法。实验表明,改进的方法比原来的方法有更出色的平滑效果。
One of the complicated problems in statistical natural language processing is the data-sparseness problem. There are mainly two kinds of smoothing technologies to solve it, backing-off models and linear interpolated models. This article compares several typical linear interpolated methods, and focuses on studying the relationship between the smoothing parameters and the parts of speech. Besides, two improved methods are proposed. Our experiment results show that both of them outperform original ones.
出处
《计算机科学》
CSCD
北大核心
2007年第6期223-225,244,共4页
Computer Science
关键词
统计语言模型
数据稀疏问题
平滑技术
回退法
线性插值法
N-GRAM
Statistical language model,Data sparse problem, Smoothing technology,Backing- off methods,Linear interpolated methods, N-gram