摘要
近年来,知识图谱领域中实体关系抽取技术得到快速发展,其准确性也大幅提升。然而,大部分文献都没有提供能够反映其内容的、直观的数据结构。依靠人工阅读文本产生实体、关系的方法,在多源、海量文档数据的今天越来越不能满足实际应用的需求,因此提出一种抽取文本中实体关系的方法。该方法基于哈工大语言技术平台(Language Technology Plantform,LTP)和双向编码器(Bidirectional Encoder Representations from Transformer,BERT)模型,可对文本内容实现自动化解析,解决了数据集生成难的问题。此外,通过对BERT模型的优化调整,解决了以往实体关系的抽取需依赖大量资源计算的问题。
In recent years, entity relation extraction technology in the field of knowledge graphs has developed rapidly, the accuracy has been greatly improved. However, most of the documents do not provide intuitive data structure that can reflect their content. Relying on manual reading of text to generate entities and relations, in today’s multi-source and massive document data, it is increasingly unable to meet the needs of practical applications. Therefore, a scheme for extracting entity relations in texts is proposed. The scheme is based on LTP(Language Technology Plantform) of Harbin industrial university and BERT(Bidirectional Encoder Representations from Transformer) model. This scheme can automatically parse the text context automatically and solve the problem of generating training and test data set effectively. In addition, through the optimization and adjustment of the BERT model, the problem that the extraction of entity relations in the past have to rely on a large number of resource calculations is solved.
作者
房冬丽
陈正雄
黄元稳
衡宇峰
FANG Dongli;CHEN Zhengxiong;HUANG Yuanwen;HENG Yufeng(No.30 Institute of CETC,Chengdu Sichuan 610000,China)
出处
《通信技术》
2021年第8期1862-1868,共7页
Communications Technology