摘要
在中文自动分词及词性标注系统中,电子词典是系统的重要组成部分,也是影响系统性能的重要因素之一。介绍了电子词典应该具备的查询功能及常用的组织结构,给出了一种结构为系统词典+用户词典的可扩展式电子词典机制。其系统词典是基于首字Hash散列的逐字二分词典结构,用户词典采用基于首字Hash散列的链接表词典结构,具有很强的扩展性和实用性。
Digital dictionary is an important part in automatic Chinese word segmentation and part of speech tagging,which is also a vital factor aftecting system performance.This thesis introduces the necessary searching thnetions and common components for a digital dictionary and proposes an extendable mechanism which consists of system dictionary and user dictionary.The system dictionary is indexed with initial character hash table characterized with character-based binary tree structure.The user's dictionary is also indexed with initial character hash table but augmented with linking structure.Experiment shows that the system is extendable in practice.
出处
《计算机工程与应用》
CSCD
北大核心
2008年第21期199-201,共3页
Computer Engineering and Applications
基金
国家自然科学基金( the National Natural Science Foundation of China under Grant No.60773173)
江苏省自然科学基金( the Natural Science Foundation of Jiangsu Province of China under Grant No.07YYB003)
关键词
电子词典
词典结构
自动分词
HASH
digital dictionary
dictionary structure
automatic word segmentation
hash