期刊文献+

基于Spark的大规模单图频繁子图算法

Single Large-Scale Graph Frequent Subgraph Algorithm Based on Spark
下载PDF
导出
摘要 随着互联网的快速发展,校园一卡通得到了广泛的普及,进而服务器上的数据也在迅速增加。单机算法已无法支撑支持度较低的频繁子图挖掘和子图增长模式的挖掘。海量的单图频繁子图的数据挖掘已无法在单机上实现,现有的Hadoop分布式框架也并不适合迭代式算法运行。因此论文提出一种基于Spark的大规模单图频繁子图挖掘FSMBUS算法。通过次优树构建并行计算的候选子图,在给定最小支持度时挖掘出所有的频繁子图,实验结果表明,单图上最新的算法比FSMBUS的效率慢一个数量级,FSMBUS算法可支持更低的支持度阈值以及更大的图数据挖掘,比Hadoop移植版的效率快2~4倍,分析我校一卡通数据可帮助院校管理、领导决策提出可参照的依据。 With the rapid development of the Internet,the campus card has been widely popularized,and the data on the serv er is also increasing rapidly.The single computer algorithm can not support frequent subgraph mining and growth pattern mining.The data mining of a large number of single graph frequent subgraphs can not be realized on a single machine.The Hadoop distribut ed framework is not suitable for iterative algorithm.Therefore,In this paper,a distributed algorithm named FSMBUS for mining fre quent subgraph in a single large-scale graph under Spark frame work is proposed.It constructs the parallel computing candidate sub graphs by suboptimal CAM Tree,w hich returns all the frequent subgraphs for given user-defined minimum support.This experi ments show that the single chart of new algorithms than the efficiency of FSMBUS is an order of magnitude slower,F SMBUS algo rithm can support lower support threshold and larger map data mining,2~4 times faster than the efficiency of the Hadoop version of the transplant,analysis of our campus card can help college management and leadership of colleges and universities to put forward a reference basis.
作者 蒋来好 朱志祥 赵子晨 JIANG Laihao;ZHU Zhixiang;ZHAO Zichen(Department of Computer Sience&Technology,Xi'an University of Post&Telecommunications,Xi'an 710061;Shaanxi Information Engineering Research Institute,Xi'an 710061)
出处 《计算机与数字工程》 2019年第10期2405-2412,共8页 Computer & Digital Engineering
基金 陕西省重点研发计划项目(编号:2016KTTSGY01-01) 西安邮电大学教学改革研究项目(编号:JGZ201615)资助
关键词 校园卡 SPARK 频繁子图 分布式计算 大规模单图 campus card Spark frequent subgraphs distribute computing large-scale chart
  • 相关文献

参考文献3

二级参考文献44

  • 1Rakesh Agrawal, Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. VLDB1994, Santiago,Chile, 1994.
  • 2Heikki Mannila, et al. Search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery,1997, 1(3): 241~258.
  • 3Jong Soo Park, et al. An effective Hash based algorithm for mining association rules. SIGMOD1995, San Jose, USA, 1995.
  • 4Sergey Brin, et al. Dynamic itemset counting and implication rules for market basket data. SIGMOD1997, Tucson, USA,1997.
  • 5Ramesh C. Agarwal, et al. Depth first generation of long patterns, KDD 2000, Boston, USA, 2000.
  • 6Ramesh C. Agarwal, et al. A tree projection algorithm for generation of frequent itemsets. J. of Parallel and Distributed Computing, 2001, 61(3): 350~371.
  • 7Jiawei Han, Jian Pei, Yiwen Yin. Mining frequent patterns without candidate generation. SIGMOD2000, Dallas, USA, 2000.
  • 8J. Pei, et al.. H-Mine: Hyper-structure mining of frequent patterns in large databases. ICDM'01, San Jose, CA, 2001.
  • 9Mike Perkowitz, Oren Etzioni. Adaptive sites: Automatically learning from user access patterns. WWW' 97, Santa Clara, 1997.
  • 10J. Pei, et al.. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. ICDE'01, Heidelberg, 2001.

共引文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部