期刊文献+

基于双数组Trie树的中文分词词典算法优化研究 被引量:8

Research of an improved algorithm for Chinese word segmentation dictionary based on double-array Trie-tree
下载PDF
导出
摘要 基于双数组Trie树的中文分词词典具有较高的查找效率,但其插入时间复杂度较高。为此提出了一种基于双数组Trie树结构的改进算法iDAT,在原始词典初始化时优先处理分支多的节点,并在初始化之后对base数组中的空序列的下标值做Hash,Hash表中存放空序列之前的所有空序列个数之和,而后运用iDAT算法进行插入。本算法借鉴了单模式匹配的Sunday算法中的跳跃思想,在适当增加空间开销的基础上,降低了Trie树在动态插入过程中的平均时间复杂度,在实际操作过程中有着良好的性能。 Abstract:Chinese word segmentation dictionary based on the double-array Trie-tree has higher search effi- ciency, but the dynamic insertion consumes a lot of time. Therefore, an improved algorithm (iDAT) based on double-array Trie-tree for Chinese word segmentation dictionary is proposed. The nodes with more branches are handled while the original dictionary is being initialized. After the initialization, a Hash process is performed on the index values of empty sequence in base array. The final Hash table stores the sum of the empty sequences before the current empty sequence. After that, the iDAT is used to carry out the dynamic insertion process. This algorithm adopts Sunday jumps algorithm of single pattern matching. With the reasonable increasement of space, it reduces the the average time complexity of the dynamic insertion process in Trie-tree. Practical results show it has good operation performance.
出处 《计算机工程与科学》 CSCD 北大核心 2013年第9期127-131,共5页 Computer Engineering & Science
基金 北大方正集团有限公司数字出版技术国家重点实验室开放课题资助项目(2012072011)
关键词 双数组 TRIE树 时间复杂度 分词词典 double-array ~ Trie-tree ~ time complexity ~ word segmentation dictionary
  • 相关文献

参考文献4

二级参考文献26

共引文献80

同被引文献84

引证文献8

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部