期刊文献+

知识图谱划分算法研究综述 被引量:19

Research on Knowledge Graph Partitioning Algorithms:A Survey
下载PDF
导出
摘要 知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,如何对其进行划分已成为目前知识图谱研究的热点问题.从知识图谱和图划分的定义出发,系统性地介绍当前知识图谱数据划分的各类算法,包括基本、多级、流式、分布式和其他类型图划分算法.首先,介绍4种基本图划分算法:谱划分算法、几何划分算法、分支定界算法、KL及其衍生算法,这类算法通常用于小规模图数据或作为其他划分算法的一部分;然后,介绍多级图划分算法,这类算法对图粗糙化后进行划分再投射回原始图,根据粗糙化过程分为基于匹配的算法和基于聚合的算法;其次,描述3种流式图划分算法,这类算法将顶点或边加载为序列后进行划分,包括Hash算法、贪心算法、Fennel算法,以及这3种算法的衍生算法;再次,介绍以KaPPa、JA-BE-JA和轻量级重划分为代表的分布式图划分算法及它们的衍生算法;同时,在其他类型图划分算法中,介绍近年来新兴的2种图划分算法:标签传播算法和基于查询负载的算法.通过在合成与真实知识图谱数据集上的丰富实验,比较了5类知识图谱代表性划分算法在划分效果、查询处理与图数据挖掘方面的性能差异,分析实验结果并推广到推理层面,获得了基于实验的知识图谱划分算法性能评价结论.最后,在对已有方法分析和比较的基础上,总结目前知识图谱数据划分面临的主要挑战,提出相应的研究问题,并展望未来的研究方向. Knowledge graphs are important cornerstones of artificial intelligence,which not only have the graph structure of the general graph model but also contain rich attribute information of vertices and edges.Therefore,knowledge graphs have received widespread attention in practical tasks.Knowledge graphs can use accurate semantics to describe the various entities and their connections in the real world,where the vertices represent entities,and the edges represent connections among these entities.Knowledge graph partitioning is a fundamental task for distributed processing of large-scale knowledge graphs,and it is the essential support for a series of graph processing operations including distributed storage,query processing,reasoning,and data mining.With the increasing scale of knowledge graphs and more requirements of distributed processing,it is difficult to partition large-scale knowledge graphs,which has become a hot issue in the current research on knowledge graphs.Starting from the definitions of knowledge graphs and graph partitioning,we systematically introduce various algorithms currently available for knowledge graph partitioning,including basic graph partitioning algorithms,multilevel graph partitioning algorithms,streaming graph partitioning algorithms,distributed graph partitioning algorithms,and other types of graph partitioning algorithms.First,four basic graph partitioning algorithms are reviewed,including spectral partitioning algorithms,geometric partitioning algorithms,branch and bound algorithms,KL and its derivative algorithms.This type of algorithms usually works on small-scale graphs or as subroutines for more sophisticated algorithms.Second,two multilevel graph partitioning algorithms are introduced,one of which uses the matching-based algorithm during the coarsening phase,and the other uses the aggregation-based algorithm.After coarsening,they partition the much smaller graph and then project the result back to the original graph.Third,streaming graph partitioning algorithms are described,which load a graph as a sequence of vertices or edges and assign each element to the corresponding partitions.The streaming graph partitioning algorithms can be further divided into three subcategories:the hash algorithm,the greedy algorithm,the Fennel algorithm,and their derivative algorithms.Fourth,we introduce distributed graph partitioning algorithms for partitioning large-scale graphs in distributed environments.We choose three representative algorithms,namely the KaPPa algorithm,the JA-BE-JA algorithm,the lightweight repartitioning algorithm,and then describe derivative algorithms of these distributed algorithms.Meanwhile,as the other types of graph partitioning algorithms,we present two recent graph partitioning algorithms,one is the label propagation algorithm,and the other is the query workload-based algorithm.By conducting extensive experiments on both synthetic and real knowledge graph datasets,we compare the performance of five representative knowledge graph partitioning algorithms in partitioning effects,query processing,and graph data mining.We analyze the experimental results in detail and draw general conclusions,and extend them to the reasoning tasks.The performance evaluation conclusions of knowledge graph partitioning algorithms are obtained based on these experiments and analyses.Finally,based on the analysis and comparison of existing methods,we summarize the main challenges in the current research on knowledge graph partitioning,propose the corresponding issues that can be investigated,and look forward to the future research directions on this topic.
作者 王鑫 陈蔚雪 杨雅君 张小旺 冯志勇 WANG Xin;CHEN Wei-Xue;YANG Ya-Jun;ZHANG Xiao-Wang;FENG Zhi-Yong(College of Intelligence and Computing,Tianjin University,Tianjin 300354;Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin 300354)
出处 《计算机学报》 EI CSCD 北大核心 2021年第1期235-260,共26页 Chinese Journal of Computers
基金 国家重点研发计划项目(2019YFE0198600) 国家自然科学基金(61972275)资助。
关键词 知识图谱 图划分 多级划分 流划分 分布式 knowledge graph graph partitioning multilevel partitioning streaming partitioning distribution
  • 相关文献

参考文献6

二级参考文献124

  • 1Amazon SimpleDB. http://aws, amazon, com/simpledb/, 2011-8-10.
  • 2Connor Alexander G, Chrysanthis Panos K, Labrinidis Alexandros. Key key-value stores for efficiently processing graph data in the cloud//Proceedings of the GDM. Hannover, Germany, 2011:88-93.
  • 3Lordanov Borislav. HyperGraphDB: A generalized graph database//Proceedings of the IWGD. JiuZhai Valley, China, 2010:25-36.
  • 4Eifrem Emil. NOSQL: Scaling to size and scaling to complexity, http://blogs, neotechnology, com/emil/2009/11/ nosql-scaling tosize-and-scaling-to-complexity, html, 2009- 1-15.
  • 5Wu Sai, Jiang Da-Wei, Ooi Beng Chin et al. Efficient B-tree based indexing for cloud data proeessing//Proeeedings of the VLDB. Singapore, 2010: 1207-1218.
  • 6Wang Jin-Bao, Wu Sai, Gao Hong et al. Indexing multi dimensional data in a cloud system//Proceedings of the SIGMOD. Indianapolis, Indiana, USA, 2010: 591-602.
  • 7Tsatsanifos George, Sacharidis Dimitris, Sellis Timos et al. MIDAS: Multi-attribute indexing for distributed architecture systems//Proceedings of the SSTD. Minneapolis, MN, USA, 2011:168-185.
  • 8Aguilera M K, Golab W, Shah M A. A practical scalable distributed B-tree//Proceedings of the VLDB. Auckland, New Zealand, 2008: 598-609.
  • 9Zhang Xiang-Yu, Ai Jing, Wang Zhong-Yuan, Lu Jia-Heng et al. An efficient multi-dimensional index for cloud data management//Proceedings of the CloudDB. Hong Kong, China, 2009:17-24.
  • 10InfiniteGraph, the Distributed Graph Database. http:// www. infinitegraph, com/, 2011 -7 -29.

共引文献254

同被引文献168

引证文献19

二级引证文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部