摘要
远程监督关系抽取通过自动标注数据减少人工标注成本,但存在句子标签噪声和关系长尾分布两个问题.为解决上述问题,提出一种融合知识图中实体信息以及实体和关系间约束的关系抽取方法.该方法对目标实体和其邻居实体的属性进行编码,对目标实体和邻居实体构成的邻居图进行编码,对实体类型和关系间约束进行编码,并通过多源融合注意力模块进行信息整合,构建关系抽取模型.该方法在NYT-10数据集上的AUC值为0.524,P@100值为94.8%,长尾指标Hits@K较之前最先进模型均有提升,取得了优异表现,表明该方法融合实体信息和约束信息解决远程监督关系抽取两个主要问题的有效性.
Automatically labeling data can reduce manual annotation costs in the process of distant supervised relation extraction generally,existing two problems,sentence label noise and long-tail relation distribution.To solve the problems,a relationship extraction method was proposed to fuse entity information from knowledge graphs and constraints between entities and relations.The proposed method was designed to encode the target entity,its neighboring entities'attributes,and to encode the neighboring graph formed by the target entity and its neighbors.Additionally,the constraints between entity types and relations were encoded,and all this information was integrated through a multi-source fusion attention module to construct a relationship extraction model.The AUC value of the method on the NYT-10 dataset is 0.524,with P@100 value of 94.8%.The long-tail metric Hits@K has improved compared to previous state-of-the-art models,emonstrating excellent performance and showcasing the effectiveness of the method's integration of entity information and constraint information to address the two main issues of DSRE.
作者
刘琼昕
牛文涛
王佳升
LIU Qiongxin;NIU Wentao;WANG Jiasheng(School of Computer Science and Technology,University of Beijing Institute of Technology,Beijing 100081,China;Beijing Engineering Applications Research Center on High Volume Language Information Processing and Cloud Computing,Beijing 100081,China)
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2024年第7期731-739,共9页
Transactions of Beijing Institute of Technology
基金
国家重点研发计划项目(2020AAA0104903)
国家自然科学基金资助项目(62072039)。
关键词
远程监督关系抽取
知识上下文
约束图
多源融合注意力
distant supervised relation extraction
knowledge context
constraint graph
multi-source fusion attention