摘要
中文分词是中文信息处理的基础,歧义问题是中文分词的一个难点,而交集型歧义问题占歧义问题的90%以上,因此对交集型歧义问题的研究是中文分词研究的一个重点.通过反复的实验和对实验结果的分析,提出了5条规则,并根据这5条规则给出了一种针对交集型歧义字段切分的算法,实验结果表明,基于该算法实现的分词系统DSfenci,对于交集型歧义消解的准确率高于95.22%.
Chinese word segmentation is a base for Chinese Information Processing, and the ambiguity problem is a nodus of Chinese word segmentation and more then 90% of ambiguity problems are crossing ambiguity, so the solution of the crossing ambiguity problem is an important part of Chinese word segmentation. After repeated experiments and analyses, 5 rules and an algorithm based on these 5 rules were proposed to segment crossing ambiguity. From experiment results, it can be found that the accuracy of DSfenci system we developed based on these 5 rules reaches to 95.22%, which is an excellent experiment result.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2006年第2期223-228,共6页
Journal of Jilin University:Science Edition
基金
国家自然科学基金发展项目(批准号:60373099)
关键词
交集型歧义
规则
统计
crossing ambiguity
rules
statistics