基于Spark的大规模单图频繁子图算法

Single Large-Scale Graph Frequent Subgraph Algorithm Based on Spark

下载PDF

导出

摘要随着互联网的快速发展,校园一卡通得到了广泛的普及,进而服务器上的数据也在迅速增加。单机算法已无法支撑支持度较低的频繁子图挖掘和子图增长模式的挖掘。海量的单图频繁子图的数据挖掘已无法在单机上实现,现有的Hadoop分布式框架也并不适合迭代式算法运行。因此论文提出一种基于Spark的大规模单图频繁子图挖掘FSMBUS算法。通过次优树构建并行计算的候选子图,在给定最小支持度时挖掘出所有的频繁子图,实验结果表明,单图上最新的算法比FSMBUS的效率慢一个数量级,FSMBUS算法可支持更低的支持度阈值以及更大的图数据挖掘,比Hadoop移植版的效率快2~4倍,分析我校一卡通数据可帮助院校管理、领导决策提出可参照的依据。 With the rapid development of the Internet,the campus card has been widely popularized,and the data on the serv er is also increasing rapidly.The single computer algorithm can not support frequent subgraph mining and growth pattern mining.The data mining of a large number of single graph frequent subgraphs can not be realized on a single machine.The Hadoop distribut ed framework is not suitable for iterative algorithm.Therefore,In this paper,a distributed algorithm named FSMBUS for mining fre quent subgraph in a single large-scale graph under Spark frame work is proposed.It constructs the parallel computing candidate sub graphs by suboptimal CAM Tree,w hich returns all the frequent subgraphs for given user-defined minimum support.This experi ments show that the single chart of new algorithms than the efficiency of FSMBUS is an order of magnitude slower,F SMBUS algo rithm can support lower support threshold and larger map data mining,2~4 times faster than the efficiency of the Hadoop version of the transplant,analysis of our campus card can help college management and leadership of colleges and universities to put forward a reference basis.

作者蒋来好朱志祥赵子晨 JIANG Laihao;ZHU Zhixiang;ZHAO Zichen(Department of Computer Sience&Technology,Xi'an University of Post&Telecommunications,Xi'an 710061;Shaanxi Information Engineering Research Institute,Xi'an 710061)

机构地区西安邮电大学计算机学院陕西省信息化工程研究院

出处《计算机与数字工程》 2019年第10期2405-2412,共8页 Computer & Digital Engineering

基金陕西省重点研发计划项目(编号：2016KTTSGY01-01) 西安邮电大学教学改革研究项目(编号：JGZ201615)资助

关键词校园卡 SPARK 频繁子图分布式计算大规模单图 campus card Spark frequent subgraphs distribute computing large-scale chart

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1牛新征,牛嘉郡,苏大壮,佘堃.基于FP-Tree模型的频繁轨迹模式挖掘方法[J].电子科技大学学报,2016,45(1):86-90. 被引量：8
2汪卫,周皓峰,袁晴晴,楼宇波,施伯乐.基于图论的频繁模式挖掘[J].计算机研究与发展,2005,42(2):230-235. 被引量：17
3李先通,李建中,高宏.一种高效频繁子图挖掘算法[J].软件学报,2007,18(10):2469-2480. 被引量：35

二级参考文献44

1Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. VLDB1994, Santiago,Chile, 1994.
2Heikki Mannila, et al. Search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery,1997, 1(3): 241～258.
3Jong Soo Park, et al. An effective Hash based algorithm for mining association rules. SIGMOD1995, San Jose, USA, 1995.
4Sergey Brin, et al. Dynamic itemset counting and implication rules for market basket data. SIGMOD1997, Tucson, USA,1997.
5Ramesh C. Agarwal, et al. Depth first generation of long patterns, KDD 2000, Boston, USA, 2000.
6Ramesh C. Agarwal, et al. A tree projection algorithm for generation of frequent itemsets. J. of Parallel and Distributed Computing, 2001, 61(3): 350～371.
7Jiawei Han, Jian Pei, Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD2000, Dallas, USA, 2000.
8J. Pei, et al.. H-Mine: Hyper-structure mining of frequent patterns in large databases. ICDM'01, San Jose, CA, 2001.
9Mike Perkowitz, Oren Etzioni. Adaptive sites: Automatically learning from user access patterns. WWW' 97, Santa Clara, 1997.
10J. Pei, et al.. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. ICDE'01, Heidelberg, 2001.

共引文献51

1鲁慧民,冯博琴,宋擒豹.频繁子图挖掘研究综述[J].微电子学与计算机,2009,26(3):156-161. 被引量：1
2詹宇斌,殷建平,张玲,龙军,程杰仁.一种基于有向树挖掘Web日志中最大频繁访问模式的方法[J].计算机应用,2006,26(7):1662-1665. 被引量：9
3陈亮,高建民,李青,陈琨.基于频繁活动序列挖掘的过程改进机会分析[J].西安交通大学学报,2006,40(11):1310-1314. 被引量：1
4刘勇,李建中,朱敬华.一种新的基于频繁闭显露模式的图分类方法[J].计算机研究与发展,2007,44(7):1169-1176. 被引量：10
5吴卫江,李国和.一种基于极大连通子图的电信社群网分割算法[J].计算机工程与应用,2008,44(5):8-9. 被引量：2
6王涛.一种基于频繁子树的数据库索引方法[J].华中科技大学学报（自然科学版）,2008,36(3):103-106.
7高琳,覃桂敏,周晓峰.图数据中频繁模式挖掘算法研究综述[J].电子学报,2008,36(8):1603-1609. 被引量：9
8周军,姜元春,林文龙.基于有向带权图的Web用户浏览行为模型[J].情报理论与实践,2008,31(5):795-798. 被引量：1
9吴甲,陈崚.一种快速的频繁子图挖掘算法[J].计算机应用,2008,28(10):2533-2536. 被引量：4
10付立东,赵永刚,邓福岐.二维非线性对流扩散方程求解程序优化[J].西安科技大学学报,2009,29(1):104-108.

1无.“五大行动”精准发力推进职业院校管理水平提质增效[J].当代职校生,2019,0(1):28-29.
2赵梦琪.资产管理工作在艺术院校的应用[J].时代人物,2019,0(13):62-63.
3谢遵国,王广明.院校教学智能化管理的“利器”——课表智能管理手机APP研发初探[J].电脑知识与技术,2019,15(8X):173-174.
4申晋祥,鲍美英.基于项目关联的Slope One协同过滤算法研究[J].计算机与数字工程,2019,47(8):1856-1860.
5白艳梅,刘改娟.榆林——扬起职教初心与使命的风帆[J].陕西现代职业教育研究,2019,0(3):46-49.
6杨利斌,谢瑞煜,杜亚杰.多无人机协同决策分布式栅格战术计算技术研究[J].舰船电子工程,2019,39(10):67-70. 被引量：1
7崔树红,刘全力,唐立庭.数据时代背景下“数字政府”技术架构研究与应用分析[J].信息系统工程,2019,32(7):24-29. 被引量：23
8丁天霞.新办高职院校标杆管理实施策略及成效研究[J].中国职业技术教育,2019,0(21):86-91. 被引量：4
9《艺术教育》编辑部.《艺术教育》杂志2020年征稿函[J].艺术教育,2019,0(9):204-204.
10张昊东,李均锋,白玉玲,赵晶,付进军.高技术复杂度单机产品的研制模式探索与应用[J].航天工业管理,2019(9):145-149.

计算机与数字工程

2019年第10期

浏览历史

内容加载中请稍等...

基于Spark的大规模单图频繁子图算法

参考文献3

二级参考文献44

共引文献51

相关作者

相关机构

相关主题

浏览历史