摘要
提出一种统计与规则相结合的决策树算法进行汉语代词共指消解,利用规则过滤掉属性冲突的反例,一定程度上弥补了决策树算法忽略属性关联性的缺点.采用Chinese Treebank作为语料进行测试,手工标注其中的共指关系和特征向量;首先用规则过滤,然后采用C4.5决策树算法选择先行语.实验结果显示,消解成功率为82.59%,其中人称代词和指示代词的成功率分别为87.60%和75.21%.
An integrated method based on decision tree for Chinese pronominal coreference is proposed. The basic idea is to some extent that filtering out the negative examples based on rules and could compensate the drawback of decision tree that ignoring the relationship between attributes. The performance of the proposed method is tested on Chinese Treebank. In our experiments, the attributes and coreferences are manually labeled, and then the rule patterns are utilized to feature vectors following the decision tree of CA. 5 algorithm. The success rate is 82.59 %, in which the rate of personal pronouns and demonstrative pronouns are 87.60 % and 75.21 % respectively.
出处
《北京邮电大学学报》
EI
CAS
CSCD
北大核心
2006年第4期1-5,共5页
Journal of Beijing University of Posts and Telecommunications
基金
国家"863计划"项目(2004AA117310)
关键词
自然语言理解
共指消解
汉语代词
决策树
过滤规则
natural language understanding
coreference resolution
Chinese pronoun
decision tree
filter rules