摘要
词性标注作为汉语自动分词以至中文信息处理领域比较关键的问题之一,是该领域的研究难点也是研究重点,对兼类词词性标注的正确率严重影响着词性标注的质量。在基于规则的词性标注的基础上,提出了一种基于规则优先级的词性标注方法,即对每条词性标注规则加上优先级,并在标注算法中通过对优先级进行控制来完成兼类词的词性标注。并用大规模语料对该方法做了试验,结果表明其词性标注正确率可达到96.4%。
As one of the important problem in the field of Chinese automatic word segmentation and Chinese information process, part of speech tagging is the research difficulty and emphases in the filed. The precision of POS tagging to syntactic category has influenced badly the quality of POS tagging. On the basis of POS tagging based on rules, proposed a method of POS tagging based on PRI of rules, that is adding PRI to each rule.Through controlling the PRI in the tagging algorithm,POS tagging to syntactic categories can be completed. Lots of examples are used to test the method, The result shows that the precision of POS tagging is 96.4%.
出处
《安徽工业大学学报(自然科学版)》
CAS
2008年第4期426-429,共4页
Journal of Anhui University of Technology(Natural Science)
关键词
汉语自动分词
词性标注
兼类词
规则优先级
Chinese automatic word segmentation
POS tagging
syntactic category
priority of rules