摘要
文本数据中的实体和关系抽取是领域知识图谱构建和更新的来源.针对金融科技领域中文本数据存在重叠关系、训练数据缺乏标注样本等问题,提出一种融合主动学习思想的实体关系联合抽取方法.首先,基于主动学习,以增量的方式筛选出富有信息量的样本作为训练数据;其次,采用面向主实体的标注策略将实体关系联合抽取问题转化为序列标注问题;最后,基于改进的BERT-BiGRU-CRF模型实现领域实体与关系的联合抽取,为知识图谱构建提供支撑技术,有助于金融从业者根据领域知识进行分析、投资、交易等操作,从而降低投资风险.针对金融领域文本数据进行实验测试,实验结果表明,本文所提出的方法有效,验证了该方法后续可用于金融知识图谱的构建.
Extraction of entities and relationships from text data is used to construct and update domain knowledge graphs. In this paper, we propose a method to jointly extract entities and relations by incorporating the concept of active learning;the proposed method addresses problems related to the overlap of vertical domain data and the lack of labeled samples in financial technology domain text data using the traditional approach. First, we select informative samples incrementally as training data sets. Next, we transform the exercise of joint extraction of entities and relations into a sequence labeling problem by labelling the main entities. Finally, we fulfill the joint extraction using the improved BERT-BiGRU-CRF model for construction of a knowledge graph, and thus facilitate financial analysis, investment, and transaction operations based on domain knowledge, thereby reducing investment risks. Experimental results with finance text data shows the effectiveness of our proposed method and verifies that the method can be successfully used to construct financial knowledge graphs.
作者
付瑞
李剑宇
王笳辉
岳昆
胡矿
FU Rui;LI Jianyu;WANG Jiahui;YUE Kun;HU Kuang(School of Information Science and Engineering,Yunnan University,Kunming 650500,China)
出处
《华东师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第5期24-36,共13页
Journal of East China Normal University(Natural Science)
基金
国家自然科学基金(U1802271)
云南省重大科技专项(202002AD080002-1-B)
云南省青年拔尖人才计划(C6193032)
云南省教育厅科研基金(2020J0004)。
关键词
领域文本
领域知识图谱
实体关系联合抽取
主动学习
序列标注
domain text
domain knowledge graph
joint extration of entities and relations
active learning
sequence labeling