摘要
关系抽取旨在从文本中抽取实体与实体之间的语义关系。作为关系抽取的上层任务,实体识别所产生的错误将扩散至关系抽取,从而导致级联错误。与实体相比,实体边界粒度小且具有二义性,更易识别。因此,提出一种基于实体边界组合的关系抽取方法,通过跳过实体,对实体边界两两组合来进行关系抽取。由于边界性能高于实体性能,所以错误扩散的问题得到了缓解;并且通过特征组合的方法将实体类型特征和位置特征加入模型中,性能得到了进一步提高,再次减轻了错误扩散带来的影响。实验结果表明,所提方法在ACE 2005英文数据集的宏平均F1值优于表格-序列编码器方法8.61个百分点。
Relation extraction aims to extract the semantic relationships between entities from the text.As the upper-level task of relation extraction,entity recognition will generate errors and spread them to relation extraction,resulting in cascading errors.Compared with entities,entity boundaries have small granularity and ambiguity,making them easier to recognize.Therefore,a relationship extraction method based on entity boundary combination was proposed to realize relation extraction by skipping the entity and combining the entity boundaries in pairs.Since the boundary performance is higher than the entity performance,the problem of error propagation was alleviated;in addition,the performance was further improved by adding the type features and location features of entities through the feature combination method,which reduced the impact caused by error propagation.Experimental results on ACE 2005 English dataset show that the proposed method outperforms the table-sequence encoders method by 8.61 percentage points on Macro average F1-score.
作者
李昊
陈艳平
唐瑞雪
黄瑞章
秦永彬
王国蓉
谭曦
LI Hao;CHEN Yanping;TANG Ruixue;HUANG Ruizhang;QIN Yongbin;WANG Guorong;TAN Xi(College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China;State Key Laboratory of Public Big Data(Guizhou University),Guiyang Guizhou 550025,China;Guizhou Qingduo Technology Company Limited,Guiyang Guizhou 550025,China)
出处
《计算机应用》
CSCD
北大核心
2022年第6期1796-1801,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(62066008)
贵州省科学技术基金重点项目(黔科合基础[2020]1Z055)。
关键词
关系抽取
实体识别
级联错误
实体边界组合
特征组合
relation extraction
entity recognition
cascading error
entity boundary combination
feature combination