摘要
为了能够快速、准确地进行中文分词,在传统分词词典构造及相应算法的基础上,提出了改进的基于词典中文分词方法。该方法结合双字哈希结构,并利用改进的正向最大匹配分词算法进行中文分词,既提高了分词速度,同时解决了传统最大匹配分词算法中的歧义问题。实验结果表明,该方法在一定程度上提高了中文词语切分的准确率,同时大大缩短了分词时间。
In order to segment Chinese words quickly and accurately, a new word segmentation method is proposed based on the traditional word dictionary structure and the corresponding algorithms. Double hash structure and improved positive maximal matching algorithm are utilized in the proposed method. And hence, the method can improve the speed of segmentation and largely eliminates ambiguities caused by maximum matching method. Experiments show that the proposed method improves segmentation accuracy rate and largely shorten time for segmentation.
出处
《计算机工程与设计》
CSCD
北大核心
2013年第5期1802-1807,共6页
Computer Engineering and Design
关键词
中文分词
词典
哈希结构
正向最大匹配算法
歧义
Chinese segmentation
dictionary
hash structure
positive maximal matching algorithm
ambiguity