摘要
知识补全是提高知识图谱质量的关键技术,为了更好地利用中文知识图谱,该文对中文知识图谱补全进行研究。针对大多数研究聚焦于英文数据集,缺少中文知识补全数据集的情况,在已有数据集的基础上,该文构建了中文UMLS+ownthink数据集。现有知识图谱补全方法大多忽视BERT模型表征能力不足、位置信息学习能力弱的问题,且未考虑中文文本特征复杂、语序依赖性强的特点,因此提出一种名为MpBERT-BiGRU的中文知识图谱补全模型,利用平均池化策略有效缓解BERT模型表征能力弱的问题,并通过BiGRU网络强化特征信息,提高位置信息学习能力;同时将三元组转化为文本序列,结合实体描述信息作为模型的输入,利用背景知识丰富实体信息。链接预测实验结果表明,该方法在平均排名(Mean Rank, MR)指标上相比传统方法提高10.39,前10命中率(Hit@10)指标提高4.63%,验证了模型在中文语料库上的有效性。
Knowledge graph completion is an important technology to improve knowledge graph quality. To make better use of Chinese knowledge graph, Chinese knowledge graph completion was studied. In view of the fact that most studies focus on English data sets and lack Chinese knowledge completion data sets, we construct Chinese UMLS+ownthink data sets on the basis of existing data sets. Most of the existing knowledge graph completion methods ignored insufficient sequence representation ability and weak position learning ability of BERT model, and didn’t take into account the complex textual feature and strong word order dependence of Chinese text. Therefore, MpBERT-BiGRU model was proposed for Chinese knowledge graph completion. Mean-pooling strategy was used to improve the sequence representation ability of BERT model, and BiGRU was adopted to strengthen the feature information and improve the position learning ability. Meantime, the triples were transformed into text sequence and fed to input layer. Combined with description information, the entities’ information were enriched by abundant background knowledge. The experimental results of linking prediction showed that the proposed method can improve the Mean Rank(MR) index by 10.39 and the top 10 hit rate(Hit@10) index by 4.63% compared with traditional methods, verifying the effectiveness of this model on the Chinese data set.
作者
田昊
张骁雄
刘文杰
刘浏
刘姗姗
丁鲲
TIAN Hao;ZHANG Xiao-xiong;LIU Wen-jie;LIU Liu;LIU Shan-shan;DING Kun(The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007,China;School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing 210044,China;Suqian University,Suqian 223800,China)
出处
《计算机技术与发展》
2023年第3期110-119,共10页
Computer Technology and Development
基金
国家自然科学基金项目(62071240)
国防科技大学校科研计划项目(ZK20-46)
江苏省高校自然科学研究基金(20KJB413003)。