摘要
信息化时代,如何从海量自然语言文本中提取结构化信息已经成为研究热点。电力系统中繁杂的知识信息需要通过构建知识图谱来解决,而实体关系抽取是其上游的信息抽取任务,其完成度直接关系到知识图谱的有效性。而随着深度学习的不断发展,利用深度学习技术来完成实体关系抽取任务的研究逐渐展开并取得了良好的效果。然而目前依然存在文本语义应用不完全等问题。针对这些问题本文尝试提出了一种基于异构图神经网络和文本语义增强的实体关系抽取方法,该方法使用词节点与关系节点学习语义特征,并通过BRET与预训练任务分别获得两种节点的初始特征,使用多层图网络结构迭代更新,并在每一层中使用基于多头注意力机制的信息传递实现两种节点的交互。通过该模型与其他实体关系抽取在两个公开数据集上实验对比,所提模型取得了预期效果,在多种情境下普遍优于对比模型。
In the era of information technology,extracting structured information from massive natural language texts has become a research hotspot.The complex knowledge information in the power system needs to be solved by constructing a knowledge graph,and entity relation extraction is the upstream information extraction task,whose completeness directly affects the effectiveness of the knowledge graph.With the continuous development of deep learning,research on using deep learning techniques to solve entity relation extraction tasks has gradually been carried out and achieved good results.However,there are still problems such as incomplete application of text semantics.This paper attempts to propose an entity relation extraction method based on heterogeneous graph neural network and text semantic enhancement to address these issues.This method uses word nodes and relationship nodes to learn semantic features and obtains initial features of the two types of nodes through BRET and pre-training tasks respectively.It uses a multi-layer graph network structure for iteration and implements the interaction between the two types of nodes by using multi-head attention mechanism for information transmission in each layer.Through experimental comparison with other models on two public datasets,this model achieves the expected effect and generally outperforms other entity relationship extraction models in various scenarios.
作者
彭勃
李耀东
龚贤夫
李浩
PENG Bo;LI Yaodong;GONG Xianfu;LI Hao(Grid Planning and Research Center of Guangdong Power Grid Co,Guangzhou 510080,China;College of Computer Science,Sichuan University,Chengdu 610065,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S01期256-260,共5页
Computer Science
基金
中国南方电网有限责任公司科技项037700KK52220042(GDKJXM20220906)。
关键词
深度学习
自然语言处理
知识图谱
实体关系抽取
异构图神经网络
文本语义增强
Deep learning
Natural language processing
Knowledge graph
Entity relation extraction
Heterogeneous graph neural networks
Text semantic enhancement