摘要
现有的实体关系联合抽取任务为了自动生成大规模训练数据引入远程监督策略,在处理数据时产生严重的噪声数据问题。对此提出了一种融合强化学习的实体关系联合抽取模型,该模型由强化学习和联合抽取模型两个部分组成,其中联合抽取模型由图卷积网络和多头自注意力机制构成。首先,使用强化学习去除原始数据集中带有噪声的句子,将降噪后的高质量句子输入到联合抽取模型中;其次,使用联合抽取模型对输入句子中的实体和关系进行预测抽取,并向强化学习提供反馈奖励,指导强化学习挑选出高质量的句子;最后,对强化学习和联合抽取模型进行联合训练,并对模型进行迭代优化。实验证明了该模型能够有效处理数据噪声问题,在实体关系抽取方面优于基线方法。
Existing joint extraction tasks of entities and relationships introduce distant supervision strategies to automatically generate large-scale training data,leading to severe problems of noisy data during data processing.To address the issue of noisy data,this paper proposes an entity relation joint extraction model with reinforcement learning integration.The model consists of two components:reinforcement learning and joint extraction model.The joint extraction model is composed of a graph convolutional network and a multi-head self-attention mechanism.Firstly,reinforcement learning is utilized to eliminate noisy sentences from the original dataset,and the denoised high-quality sentences are input into the joint extraction model.Secondly,the joint extraction model is employed to predict and extract entities and relationships from the input sentences,and provide feedback rewards to the reinforcement learning component to guide it in selecting high-quality sentences.Finally,the reinforcement learning and joint extraction models are jointly trained and iteratively optimized.The experiments demonstrating that the proposed model can effectively address the issue of data noise and outperform baseline methods in entity relationship extraction.
作者
翟社平
李航
亢鑫年
杨锐
ZHAI Sheping;LI Hang;KANG Xinnian;YANG Rui(School of Computer Science&Technology,Xi’an University of Posts and Telecommunications,Xi’an 710121,China;Shaanxi Key Laboratory of Network Data Analysis&Intelligent Processing,Xi’an 710121,China)
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2024年第2期243-251,共9页
Journal of University of Electronic Science and Technology of China
基金
国家自然科学基金(61373116)
工业和信息化部通信软科学项目(2018-R-26)
陕西省教育厅科学研究计划(18JK0697)
陕西省重点研发计划(2022GY-038)
西安邮电大学研究生创新基金(CXJJYL2021045)
陕西省大学生创新创业训练计划(202211664053)
陕西省大学生创新创业训练计划(202211664086)。
关键词
实体关系联合抽取
噪声数据
强化学习
多头自注意力机制
图卷积网络
joint extraction of entities and relationships
noisy data
reinforcement learning
multi-head self-attention mechanism
graph convolutional network