期刊文献+

基于协陪义动词的中文隐式实体关系抽取 被引量:4

Chinese Named Entity Implicit Relation Extraction Based on Company Verbs
下载PDF
导出
摘要 实体关系抽取的目标在于探测实体之间的显式关系和隐式关系.现有研究大多集中在显式实体关系抽取,而忽略了隐式实体关系抽取.针对旅游和新闻领域文本经常包含许多由协陪义动词引发的隐式实体关系,本文研究了基于协陪义动词的中文隐式实体关系抽取问题.将机器学习方法与规则相结合,借助于显式实体关系对隐式实体关系进行推理.首先,利用依存句法分析,设计了协陪义候选句型分类算法以及相应的协陪义成分识别算法;其次,根据协陪义成分和协陪义动词作用范围的特点,设计了三种句内基于协陪义动词的隐式实体关系推理规则;最后,利用协陪义句中零形回指的先行词,建立不同句子中协陪义动词的主体成分与客体成分之间的联系,实现句间基于协陪义动词的隐式实体关系抽取.另外,本文还提出了趋向核心动词特征提取算法,进一步提高了动词特征对显式实体关系抽取的效果.基于真实的旅游领域和新闻领域文本数据集进行了详细的实验测试,实验结果表明了方法的有效性. The target of named entity relation extraction is to detect explicit and implicit relations between entities.Most of the existing researches focus on explicit entity relation extraction,but ignore implicit entity relation extraction.Compared with explicit relations,implicit relations have no explicit supporting evidence in text and require additional evident from a reading of the document.Therefore,implicit relations usually need to integrate semantic associations of sentence content with relevant linguistic information,specific context semantic information and related domain knowledge for indirect inference.However,because of the ambiguity of semantic relations,the complexity of sentence structures,the uncertainty of context information and the imbalance of data,the task of implicit relation extraction is more complicated and more difficult,and it cannot be implemented using ageneral model.Therefore,it has been a challenge to infer implicit relations.Several works related to implicit relation extraction have been performed for European languages and especially for English.As far as we know,very few studies have been done for Chinese language.In many text domains such as tourism and news domains,there exist many implicit entity relations triggered by company verbs.In this paper,we study the problem of Chinese implicit entity relation extraction based on company verbs.This paper proposes a two-stage scheme that takes into account both explicit relation extraction and implicit relation extraction.We integrate a machine learning method with rules and use explicit entity relations to infer implicit entity relations.Firstly,the company verb vocabulary is constructed by using a variety of methods and is used to select candidates from sentences containing company verbs.Secondly,the sentence pattern classification algorithm and the corresponding component recognition algorithm are designed for company candidate sentences.According to different roles of company verbs in the sentence,we employ dependency parsing to decide company candidate sentence patterns and to classify them.Due to the different roles of company verbs in different sentence patterns,methods of recognizing components from entities involved in company actions are also different.Using dependency parsing,we design corresponding component recognition algorithms for five kinds of company candidate sentence patterns.Finally,according to whether additional knowledge and the company verb are in the same sentence,we propose two kinds of inference methods for implicit relations based on company verbs:one for implicit in-sentence relations and the other for implicit between-sentences relations,where an in-sentence relation and a between-sentences relation refer to a relation that is inferred from a single sentence and multiple sentences,respectively.According to the characteristics of company semantic components and the scope of company verbs,we design three rules for implicit in-sentence entity relation extraction based on company verbs.Furthermore,by exploiting the antecedent of the zero anaphora in a company sentence,we establish the associations between subject and object components in different sentences,which are then used to extract implicit between-sentences entity relations based on company verbs.In addition,the feature extraction algorithm of directional core verbs is proposed to improve the effect of the verb feature on the explicit entity relation extraction.Comprehensive experiments are conducted on real tourism and news texts,and experimental results show that the proposed methods are effective.
作者 万常选 甘丽新 江腾蛟 刘德喜 刘喜平 刘玉 WAN Chang-Xuan;GAN Li-Xin;JIANG Teng-Jiao;LIU De-Xi;LIU Xi-Ping;LIU Yu(School of Information Technology,Jiangxi University of Finance and Economics,Nanchang 330013;Jiangxi Key Laboratory of Data and Knowledge Engineering,Jiangxi University of Finance and Economics,Nanchang 330013)
出处 《计算机学报》 EI CSCD 北大核心 2019年第12期2795-2820,共26页 Chinese Journal of Computers
基金 国家自然科学基金项目(61562032,61662027,61173146,61363039,61363010,61462037) 江西省自然科学基金项目(20152ACB20003,20161BAB202057) 江西省高等学校科技落地计划项目(KJLD12022,KJLD14035) 江西省教育厅科技研究项目(GJJ150819,GJJ160783) 江西省高校人文社会科学研究项目(JC161001)资助~~
关键词 关系抽取 隐式关系 协陪义动词 显式关系 动词特征 relation extraction implicit relation company verb explicit relation verb feature
  • 相关文献

参考文献9

二级参考文献97

共引文献261

同被引文献38

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部