摘要
本文研究了中文分词技术,改进了传统的整词二分分词机制,设计了一种按照词的字数分类组织的新的词典结构,该词典的更新和添加更加方便,并根据此词典结构提出了相应的快速分词算法。通过对比实验表明,与传统的整词二分、逐字二分和TRIE索引树分词方法相比,该分词方法分词速度更快。
This paper studied the Technology of Chinese word segmentation, improved the traditional whole Chinese word binary segmentation mechanism, provided a new dictionary mechanism: organized according to the number of the word count of the Chinese words, The dictionary is very easy to update and append new words. And it put forward a new fast algorithm of Chinese word segmentation based the new dictionary mechanism, Through contrastive experiments it proves that the new algorithm having faster Chinese word segmenting speed than whole Chinese word binary segmentation, one by one word Chinese word binary segmentation and the TRIE tree Chinese word segmentation,
作者
郭屹
GUO Yi(School of Software Engineering,Tongji University, Shanghai 201804, China)
出处
《电脑知识与技术》
2008年第3期1240-1245,1255,共7页
Computer Knowledge and Technology
关键词
自然语言处理
中文分词
词典法分词
natural language processing
Chinese word segmentation
Chinese word segmentation based on dictionary