摘要
针对大规模数据图下基于回溯法的子图查询算法的准确率低、开销大等问题,为提高查询准确率,降低大图下的查询开销,提出一种基于Spark的子图匹配(SQM)算法。首先根据结构信息过滤数据图,再将查询图分割成基本查询单元;然后对每一个基本查询单元分别匹配后进行Join操作;最后运用并行化提高了算法的运行效率,减小了搜索空间。实验结果表明,与Stwig、Turbo ISO算法相比,SQM算法在保证查询结果不变的情况下,速度提高了50%。
Focusing on low accuracy and high costs of backtracking-based subgraph query algorithm applied to large-scale graphs,a Spark-based Subgraph Query Matching(SQM)algorithm was proposed to improve query accuracy and reduce query overhead for large graphs.The data graph was firstly filtered according to structure information,and the query graph was divided into basic query units.Then each basic query unit was matched and joined together.Finally,the algorithm s efficiency was improved and search space was reduced by parallelization.The experimental results show that compared with Stwig(Sub twig)algorithm and TurboISO algorithm,SQM algorithm can increase the speed by 50% while ensuring the same query results.
作者
李龙洋
董一鸿
施炜杰
潘剑飞
LI Longyang;DONG Yihong;SHI Weijie;PAN Jianfei(College of Information Science and Engineering,Ningbo University,Ningbo Zhejiang 315211,China;Baidu Online Technology Company Limited,Beijing 100084,China)
出处
《计算机应用》
CSCD
北大核心
2019年第1期46-50,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(61572266)
浙江省自然科学基金资助项目(LY16F020003)
宁波市自然科学基金资助项目(2017A610114)~~
关键词
子图匹配
图分割
大规模单图
并行化
SPARK
subgraph matching
graph segmentation
single large-scale graph
parallelization
Spark