摘要
本文分析了汉语的多类词现象与汉语词类标注的困难,介绍了汉语词类标注中的规则排歧和统计排歧的处理策略以及规则和统计相结合的处理思路。按此思路设计的软件系统,对封闭语料和开放语料的标注正确率分别达到了96.06%和95.82%。
Abstract In this paper,we analyze category ambiguities of Chinese words,and introduce the schemes of rulebased disambiguation and statistics-based disambiguation in Chinese corpus tagging.We also propose a method blending rule-based processing with statistics-based processing.Using this method to tag Chinese corpus,we get the tagging accuracy of 96.06%(close testing) and 95.82% (open testing).
出处
《中文信息学报》
CSCD
1995年第3期1-10,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金