摘要
实体关系抽取是指从无结构的自然语言文本中抽取实体之间的语义关系,并以结构化的形式表示出来。传统的实体关系抽取方法只注重一种特定类型的数据源,并需要标注大量的训练数据来训练抽取模型,人工成本高。因此提出了一种综合多种数据源,并结合规则推理引擎的实体关系抽取方法,准确地说就是综合结构化和非结构化两种数据源,在结构化数据提供少量种子的情况下用规则推理引擎推理出更多的实体关系。然后使用远程监督学习方法从无结构的文本中抽取实体关系,通过多次迭代获得最终的实体关系。实验结果证明了该方法的有效性。
Entity relation extraction refers to extract semantic relationships between entities from unstructured natural language text and express in a structured form. Traditional entity relation extraction methods only focus on a particular type of data source, and label large numbers of training data by humans to train extraction model. Manually labeling training data are labor-intensive and time consuming. So this paper proposes a method integrating diversity data sources,and combines rule- based inference engine to discover relation triples. More precisely, integrating structured and unstructured data sources, and in the case of having small amount of seeds provided by structured data, a large number of entity relationships are reasoned by rule-based inference engine. The newly entity relationships are fed as seeds to distantly supervise the learning process to extract entity relationships from unstructured text. The final entity relationships are obtained through multiple iterations. The experimental results show the effectiveness of the proposed method.
出处
《计算机科学与探索》
CSCD
北大核心
2016年第9期1310-1319,共10页
Journal of Frontiers of Computer Science and Technology
基金
上海市经信委"软件和集成电路产业发展专项资金"No.140304~~
关键词
关系抽取
关系推理
远程监督
规则推理引擎
relation extraction
relation reasoning
distant supervision
rule-based inference engine