期刊文献+

基于BERT的多源知识库索引对齐算法 被引量:5

Multi⁃source knowledge base index alignment based on BERT
下载PDF
导出
摘要 各类百科知识库是当代网民学习生活不可或缺的知识来源,但是单个知识库存在实体覆盖面低、实体信息缺失等问题;而不同知识资源和知识社区之间资源描述结构存在较大差异,阻碍了数据的共享、集成与复用。针对规模巨大的百科知识库,提出一种基于Bert的多源知识库索引算法,利用Bert预训练语言模型对实体的非结构化文本建模,构建特征向量计算实体相似度,对异构知识库具有良好的通用性。在此基础上,针对实体指称与实体上下文的关键词提出两种索引构建方式,有效提高了实体对齐的效率和准确率。通过5组测试数据的验证,该算法的准确率能够达到91%~96.22%,同时召回率高达92.23%~97.33%。与此同时,该算法的索引结构在缩减率达到99􀆰94%的同时保证了98.52%的完整性。该算法可以实际应用于多源知识库的实体对齐工作。 All kinds of encyclopedic knowledge bases are indispensable knowledge sources for contemporary netizens to study and live,but the problems of single knowledge base include low entity coverage and missing entity information.Moreover,the resource description structure of different knowledge resources and knowledge communities is a biger difference,hindering the sharing,integration and reuse of data.Aiming at the large scale encyclopaedic knowledge bases,an index algorithm for multi⁃source knowledge bases based on BERT is proposed by using BERT pre⁃training language model to model the unstructured text of entities and construct feature vectors to calculate entity similarity,thus it has good universality to heterogeneous knowledge bases.On the basis of the algorithm,two index construction methods are proposed for the key words of the entity reference and the entity context,improving the efficiency and the accuracy of entity alignment.Through the verification of five groups of test data,the accuracy of the algorithm can reach 91%—96.22%,and the recall rate is as high as 92.23%—97.33%.Meanwhile,the reduction ratio of index structure of the algorithm can be reduced by 99.94%,ensuring the integrity of 98.52%.The algorithm can be applied to the entity alignment tasks of the multi⁃source knowledge bases.
作者 季一木 刘艳兰 刘尚东 许正阳 胡林 刘凯航 汤淑宁 刘强 肖婉 JI Yimu;LIU Yanlan;LIU Shangdong;XU Zhengyang;HU Lin;LIU Kaihang;TANG Shuning;LIU Qiang;XIAO Wan(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Institue of High Performance Computing and Bigdata,Nanjing University of Posts and Telecommunications,Nanjing 210023,China;Nanjing Center of HPC China,Nanjing 210023,China;Jiangsu HPC and Intelligent Processing Engineer Research Center,Nanjing 210023,China;College of Educational Science and Technology,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处 《南京邮电大学学报(自然科学版)》 北大核心 2021年第2期49-61,共13页 Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金 国家重点研发计划专项(2017YFB1401302) 江苏省重点研发计划项目(BE2019740) 江苏省高等学校自然科学研究项目(19KJB520046,20KJA520001) 江苏省自然科学基金(BK20170900) 江苏省博士后科研资助计划(2019K024) 江苏省六大高峰人才项目(JY⁃012) 南京邮电大学鼎山人才培养对象项目 南京邮电大学人才引进启动基金(NY219132)和教育部人文社会科学基金(20YJC880104)资助项目。
关键词 实体对齐 索引 BERT 知识融合 entity alignment index BERT knowledge fusion
  • 相关文献

参考文献6

二级参考文献20

共引文献103

同被引文献77

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部