摘要
语义关系抽取是信息抽取中的一个重要的研究领域。目前基于特征向量的语义关系抽取已经很难通过发掘新的特征来提高抽取的性能。本文提出了一种特征组合方法,通过在各种词法、语法、语义的基本特征内部及特征之间进行合理的组合形成组合特征,使用基于支持向量机的学习方法,使得关系抽取的准确率和召回率得到了提高。在ACE2004语料库的7个关系大类和23个关系子类抽取实验中F值分别达到了66.6%和59.50%。实验结果表明通过对基本语言学特征进行组合所得到的组合特征能够显著地提高语义关系抽取的性能。
Semantic relation extraction is one of the important fields in information extraction research. The present feature vector based approach for semantic relation extraction can hardly be improved simply by mining new features, This paper presents a novel method through combining the diverse basic lexical, syntactic and semantic features to form new combined features. The experiments show that these combined features positively improve the precision and recall of the SVM based relation extraction. The F-score of relation extraction for the 7 major types and 23 subtypes in ACE 2004 corpora achieves 66.6% and 59.50% respectively.
出处
《中文信息学报》
CSCD
北大核心
2008年第3期44-49,63,共7页
Journal of Chinese Information Processing
基金
“863”国家高技术研究发展计划资助项目(2006AA01Z147)
国家自然科学基金资助项目(60673041)
关键词
计算机应用
中文信息处理
语义关系抽取
支持向量机
组合特征
computer application
Chinese information processing
semantic relation extraction
support vector machine
combined features