期刊文献+

基于预训练和多层次信息的中文人物关系抽取模型 被引量:3

Chinese character relation extraction model based on pre-training and multi-level information
下载PDF
导出
摘要 关系抽取任务旨在从文本中抽取实体对之间的关系,是当前自然语言处理(NLP)领域的热门方向之一。针对中文人物关系抽取语料中语法结构复杂,无法有效学习文本语义特征的问题,提出一个基于预训练和多层次信息的中文人物关系抽取模型(CCREPMI)。该模型首先利用预训练模型较强的语义表征能力生成词向量,并将原始句子分成句子层次、实体层次和实体邻近层次分别进行特征提取,最终融合句子结构特征、实体含义以及实体与邻近词的依赖关系等信息进行关系分类预测。在中文人物关系数据集上的实验结果表明,该模型的精度达到81.5%,召回率达到82.3%,F1值达到81.9%,相比BERT和BERT-LSTM等基线模型有所提升。此外,该模型在SemEval2010-task8英文数据集上的F1值也达到了81.2%,表明它对英文语料具有一定的泛化能力。 Relation extraction task aims to extract the relationship between entity pairs from text,which is one of the hot directions in the field of Natural Language Processing(NLP).In view of the problem that the grammar structure is complex and the semantic features of the text cannot be learned effectively in Chinese character relation extraction corpus,a Chinese Character Relation Extraction model based on Pre-training and Multi-level Information(CCREPMI)was proposed.Firstly,the word vectors were generated by using the powerful semantic representation ability of the pre-trained model.Then,the original sentence was divided into sentence level,entity level and entity adjacent level for feature extraction.Finally,the relation classification and prediction were performed by the information fusion of the sentence structure features,entity meanings and dependency between entities and adjacent words.Experimental results on the Chinese character relationship dataset show that the proposed model has the precision of 81.5%,the recall of 82.3%,and the F1 value of 81.9%,showing an improvement compared to the baseline models such as BERT(Bidirectional Encoder Representations from Transformers)and BERT-LSTM(BERT-Long Short-Term Memory).Moreover,the F1 score of this model on SemEval2010-task8 English dataset reaches 81.2%,indicating its ability to generalize to the English corpus.
作者 姚博文 曾碧卿 蔡剑 丁美荣 YAO Bowen;ZENG Biqing;CAI Jian;DING Meirong(School of Software,South China Normal University,Foshan Guangdong 528225,China)
出处 《计算机应用》 CSCD 北大核心 2021年第12期3637-3644,共8页 journal of Computer Applications
基金 国家自然科学基金面上项目(62076103) 广东省普通高校人工智能重点领域专项(2019KZDZX1033) 广东省信息物理融合系统重点实验室开放课题(2020B1212060069)。
关键词 自然语言处理 关系抽取 预训练模型 词嵌入 特征融合 语义理解 Natural Language Processing(NLP) relation extraction pre-training model word embedding feature fusion semantic understanding
  • 相关文献

参考文献2

二级参考文献19

共引文献10

同被引文献27

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部