期刊文献+

基于Spark的Apriori并行算法优化实现 被引量:12

Optimization of Apriori Parallel Algorithm Based on Spark
下载PDF
导出
摘要 针对传统Apriori算法处理速度和计算资源的瓶颈,以及Hadoop平台上Map-Reduce计算框架不能处理节点失效、不能友好支持迭代计算以及不能基于内存计算等问题,提出了Spark下并行关联规则优化算法.该算法只需两次扫描事务数据库,并充分利用Spark内存计算的RDD存储项集.与传统Apriori算法相比,该算法扫描事务数据库的次数大大降低;与Hadoop下Apriori算法相比,该算法不仅简化计算,支持迭代,而且通过在内存中缓存中间结果减少I/O花销.实验结果表明,该算法可以提高关联规则算法在大数据规模下的挖掘效率. In view of the bottleneck of traditional Apriori algorithm in processing speed and computing re-sources, and that Map-Reduce on Hadoop could not handle node failures, friendly support iterative calcu-lation, and calculate based on memory issues ,a parallel association rule optimization algorithm based on Spark was proposed. The optimization algorithm only needed to scan the transaction database twice and it took advantage of Spark’ s RDD storage structure. By comparing with the traditional Apriori and Apriori based on Hadoop, analysis showed that Apriori based on Spark more greatly reduced the number of scan database than that of traditional Apriori, and it used less I/O overhead than Apriori based on Hadoop, because it supported storing temporary results in memory and iterative calculation. Experimental results showed that Apriori based on Spark performed effectively on big data for mining association rules.
作者 王青 谭良 杨显华 WANG Qingl TAN Liang YANG Xianhua(College of Computer Science, Sichuan Normal University, Chengdu 610101, China Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China Sichuan Institute of Computer Sciences, Chengdu 610041, China)
出处 《郑州大学学报(理学版)》 CAS 北大核心 2016年第4期60-64,共5页 Journal of Zhengzhou University:Natural Science Edition
基金 国家自然科学基金资助项目(61373162) 四川省科技支撑项目(2014GZ007)
关键词 并行化 数据挖掘 关联规则 Spark Apriori Spark parallel processing data mining association rule Apriori
  • 相关文献

参考文献5

二级参考文献64

  • 1章志刚,吉根林.基于迭代式MapReduce的Apriori算法设计与实现[J].华中科技大学学报(自然科学版),2012,40(S1):9-12. 被引量:8
  • 2秦亮曦,史忠植.SFPMax——基于排序FP树的最大频繁模式挖掘算法[J].计算机研究与发展,2005,42(2):217-223. 被引量:26
  • 3陶树平,钱挺.一种网格平台数据挖掘服务模式及其算法[J].计算机工程,2005,31(5):109-111. 被引量:10
  • 4Ceglar A, Roddick J F. Association mining [J]. ACM Computing Surveys, 2006, 38 (2): 1-42.
  • 5Piatetsky-Shapiro G. Data mining and knowledge discovery 1996 to 2005: Overcoming the hype and moving from "university" to "business" and "analytics"[J]. Data Mining Knowledge Discovery, 2007, 15(1): 99-105.
  • 6Han J, Cheng H, Xin D, et al. Frequent pattern mining: Current status and future directions[J]. Data Mining and Knowledge Discovery, 2007, 15(1): 55-86.
  • 7Calders T, Rigotti C, Boulicaut J F. A survey on condensed representations for frequent sets [G] //Constraint-Based Mining and Inductive Databases. Berlin: Springer, 2005:64- 80.
  • 8Song W, Yang B R, Xu Z Y. Index-MaxMiner: A new maximal frequent itemset mining algorithm [J]. International Journal on Artificial Intelligence Tools, 2008, 17(2): 303- 320.
  • 9Lucchese C, Orlando S, Perego R. Fast and memory efficient mining of frequent closed itemsets [J]. IEEE Trans on Knowledge and Data Engineering, 2006, 18 (1): 21-36.
  • 10Liu G, Lu H, Lou W, et al. Efficient mining of frequent patterns using ascending frequency ordered prefix-tree [J]. Data Mining and Knowledge Discovery, 2004, 9 (3) : 249- 274.

共引文献75

同被引文献99

引证文献12

二级引证文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部