摘要
歧义处理是影响分词系统切分精度的重要因素,是自动分词系统设计中的一个核心问题。本文介绍了一种新的分词算法,利用汉语句内相邻字之间的互信息及t-信息差这两个统计量,解决汉语自动分词中的歧义字段的切分问题。试验结果表明,该方法可以有效地提高歧义处理的正确率。
Ambiguity processing is an important factor to determine the precise of a word segmenting system, and a most essential problem of automated word segmenting system. This paper presents a new method for word segmentation. This method resolves ambiguity word segmentation in Chinese using the two statistical measures, interact information and difference of t-information of adjacent characters. Tests have shown that this method can effectively improve the correctness of ambiguity processing.
出处
《微计算机应用》
2006年第6期685-688,共4页
Microcomputer Applications
关键词
分词
互信息
t-信息差
歧义字段
word segmentation, interact information, difference of t-information, ambiguity word