摘要
关系抽取作为知识图谱等诸多领域的上游任务,具有广泛应用价值,近年来受到广泛关注。关系抽取模型普遍存在暴露偏差问题,抽取文本普遍存在实体嵌套和实体重叠问题,这些问题严重影响了模型性能。因此,提出了一种基于片段标注的实体关系联合抽取模型(span-labeling based model,SLM),主要包括:将实体关系抽取问题转化为片段标注问题;使用滑动窗口和三种映射策略将词元(token)序列进行组合排列重新平铺成片段(span)序列;使用LSTM和多头自注意力机制进行片段深层语义特征提取;设计了实体关系标签,使用多层标注方法进行关系标签分类。在英文数据集NYT、WebNLG上进行实验,相对于基线模型F1值显著提高,验证了模型的有效性,能有效解决上述问题。
As an upstream task in many fields such as knowledge graph,relation extraction has a wide range of application value and has received extensive attention in recent years.At present,the problem of exposure bias is common in relation extraction models,and the problems of entity nesting and entity overlapping are common in extracted text,which seriously affect the performance of the model.Therefore,this paper proposes an entity-relationship extraction model(span-labeling based model,SLM)based on Span labeling,which mainly includes:transforming entity-relation extraction problem into span labeling problem;the tokens are combined and arranged and re-tiled into a Span sequence.LSTM and multi-head self-attention mechanism are used to extract deep semantic features of the span.An entity relation label is designed,and a multi-layer labeling method is used for relation label classification.Experiments are carried out on the English datasets NYT and WebNLG.Compared with the baseline model,the F1 value is significantly improved,which verifies the effectiveness of the model,indicating that the model can effectively solve the above problems.
作者
郑肇谦
韩东辰
赵辉
ZHENG Zhaoqian;HAN Dongchen;ZHAO Hui(School of Computer Science and Engineering,Changchun University of Technology,Changchun 130012,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第9期130-139,共10页
Computer Engineering and Applications
基金
吉林省教育厅“十三五”科学技术项目(JJKH20200677KJ)。
关键词
关系抽取
联合抽取
片段标注
映射策略
暴露偏差
实体嵌套
实体重叠
relation extraction
joint extraction
span-labeling
mapping strategy
exposure bias
entity nesting
entity overlap