摘要
分析现有分词算法存在的不足,在此基础上提出一种新的分词词典,通过为分词词典建立首字Hash表和词索引表两级索引,使得该分词词典支持全二分最大匹配分词算法,利用该分词算法进行自动分词,其时间复杂度实现了大的改善。
This paper analyses the shortcoming of segmentation algorithm, designs a new algorithm for Chinese phrase segmentation. By building two levels index for Chinese thesaurus, we attain a highly efficient Chinese phrase segmentation thesaurus which supports hashing operation by means of the first Chinese character in a string and full binary search. Based on this thesaurus, we design a new algorithm for Chinese phrase segmentation.
出处
《现代图书情报技术》
CSSCI
北大核心
2007年第4期52-55,共4页
New Technology of Library and Information Service
关键词
分词算法
汉语分词
Segmentation algorithm Chinese segmentation