摘要
针对传统实体关系抽取需要预先指定关系类型和制定抽取规则等无法胜任大规模文本的情况,开放式信息抽取(Open Information Extraction,OIE)在以英语为代表的西方语言中取得了重大进展,但对于汉语的研究却显得不足。为此,研究了在组块层次标注基础上应用马尔可夫逻辑网分层次进行中文专利开放式实体关系抽取的方法。实验表明:以组块为出发点降低了对句子理解的难度,外层和内层组块可以统一处理,减少了工程代价;而且在相同特征条件下与支持向量机相比,基于马尔可夫逻辑网的关系抽取效果更理想,外层和内层识别结果的F值分别可达到77.92%和69.20%。
The main goal of information extraction is to transform unstructured or semi-structured texts into structured information, in which entity relation extraction is a major task. In general, traditional methods require pre-specified relation types. But pre-defined rules and manual labels are not adaptive to massive texts. Recently, open information extraction can solve the problems properly. In contrast with the significant achievements concerning English and other Western languages, research on Chinese open relation extraction is quite scarce. The hierarchical Chinese open entity relation extraction approach is proposed that applies Markov Logic Networks (MLN) on the base of both extemal and internal chunk-tags. The experimental results reveal that the origin of chunks can simplify the understanding of sentences, and both layers can be handled consistently so that engineering efforts are reduced. And on the same conditions, MLN can perform better than SVM, in which the F-score of external and intemal layers can reach 77.92% and 69.20% respectively.
出处
《计算机工程与应用》
CSCD
北大核心
2015年第1期125-129,171,共6页
Computer Engineering and Applications
基金
国家"十二五"科技支撑计划项目(No.2012BAH14F00)
国家自然科学基金(No.61073123)