摘要
实体关系抽取是信息抽取的组成部分,其目标是确定实体之间是否存在某种语义关系。由于中文语法错综复杂、表达方式灵活、语义多样等固有性质的限制,导致在中文中以动词作为关系表述容易引起实体间的关系含糊不清。为此,利用依存分析,提出一种开放式中文实体关系抽取方法。对输入的单句进行依存分析,通过依存分析输出的依存弧判断单句是否为动词谓语句,如果是动词谓语句则结合中文语法启发式规则抽取关系表述。根据距离确定论元位置,对三元组进行评估,输出符合条件的三元组。在Sogou CA和Sogou CS语料库上的实验结果表明,提出的方法适用于大规模语料库,具有较好的性能与可移植性。与基于卷积树核的无监督层次聚类方法相比,F值提高了16.68%。
Entity relation extraction is a part of the Information Extraction(IE).Its objective refers to determining whether there is a kind of semantic relationship between entities.To break the limitations of complex Chinese grammar,flexible expression and various semantic,which results in the vague relationship between entities simply using verbs as relational expressions in Chinese,this paper presents an open Chinese entity relation extraction method using dependency parsing.This method first does dependency parsing to the input sentence.Whether it is verb predicate sentence can be judged through the dependency arc by dependency parsing.If it is verb predicate sentence,relationship expression can be extracted combined with Chinese grammar heuristic rule.The location of the argument is determind according to the distance,evaluating the triples and outputting these qualified triples.Experimental results on SogouCA and SogouCS corpus show that the proposed method is suitable for large-scale corpus,and has good performance and portability.Contrast with unsupervised clustering method based on kernel tree,F-measure is increased by 16.68%.
出处
《计算机工程》
CAS
CSCD
北大核心
2016年第6期201-207,共7页
Computer Engineering
基金
上海市科委基金资助项目(14511107000)
关键词
开放式信息抽取
中文实体关系抽取
依存分析
无监督
启发式规则
Open Information Extraction(OIE)
Chinese entity relation extraction
dependency parsing
unsupervised
heuristic rule