摘要
本文分析了现有的基于词典的分词算法,在比较各种算法优缺点的基础上提出了将正向匹配算法与逆向匹配算法所得到的结果集进行叠加,生成粗分结果集的新观点,再对生成的粗分结果集构造非负权有向图,最后应用最短路径算法求解有向图。通过Nutch实验验证,该算法较Nutch原始搜索系统提高了其汉语切分的准确性以及切分速度,同时部分解决了交集型歧义切分问题。
This paper analyzes the existing segmentation algorithms, compares different algorithms on the basis of their advantages and disadvantages. The paper proposes the superposition results of positive and reverse matching algorithms,generates a rough-cutting result,and constructs a non-negative right graph. Finally we obtain the right results by using the shortest path algorithm. It is used in Nutch, and the results show that the algorithm is effective in improving Chinese segmentation, cutting accuracy and cutting speed, Meanwhile a partial solution to the intersection of the ambiguity segmentation problem is given.
出处
《计算机工程与科学》
CSCD
2008年第5期126-128,共3页
Computer Engineering & Science
关键词
中文分词
最短路径
叠加运算
Chinese-word-segmentation
shortest path
superposition