期刊文献+

一种基于Spark框架的并行FP-Growth挖掘算法 被引量:14

A parallel FP-Growth mining algorithm based on Spark framework
下载PDF
导出
摘要 Apriori和FP-Growth算法是频繁模式挖掘中的经典算法,由于Apriori存在更多缺陷,因此FP-Growth是单机计算环境下比较高效的算法。然而,对于非并行计算在大数据时代遇到的瓶颈,提出一种基于事务中项间联通权重矩阵的负载平衡并行频繁模式增长算法CWBPFP。算法在Spark框架上实现并行计算,数据分组时利用负载均衡策略,存入分组的数据是相应频繁项的编码。每个工作节点将分组数据中每一个事物中项的联通信息存入一个下三角联通权重矩阵中,使用被约束子树来加快每个工作节点挖掘频繁模式时创建条件FP-tree的速度,再用联通权重矩阵避免每次挖掘分组中频繁模式时对条件模式基的第一次扫描。由于联通权重矩阵和被约束子树的结合应用于每一个工作节点的FP-tree挖掘过程,因此提升了并行挖掘FP-tree性能。通过实验表明,所提出的并行算法对大的数据有较高性能和可扩展性。 The Apriori and FP-Growth are classical algorithms in frequent pattern mining, brace the Apriori has more flaws, the FP-Growth is a more efficient algorithm in stand-alone computing environment. Aiming at the bottlenecks of non-parallel computing in the era of big data, we propose a balanced parallel frequent pattern (BPFT) growth algorithm based on the connect-weight (CW) matrix of items in each transaction, called CWBPFP, which achieves parallel computing based on Spark framework. We use the load balance strategy to group data, and the corresponding code of each frequent item is stored in the relevant group during grouping. The connect information of items in each transaction of each grouped data is stored into a lower triangular connect-weight matrix by each working node. We use the restricted sub-tree to accelerate the speed of producing conditional FP-tree, and employ the connectweight matrix to avoid the first scanning for the conditional patterns during mining frequent patterns of grouped data. The performance of the parallel mining FP-tree is improved due to the combination of the CW matrix and the restricted sub-tree applied to FP-tree mining process of each node. Experiments show that the CWBPFP has high performance and scalability on big data sets.
作者 张稳 罗可
出处 《计算机工程与科学》 CSCD 北大核心 2017年第8期1403-1409,共7页 Computer Engineering & Science
基金 国家自然科学基金(71371065 11671125) 湖南省科技计划项目(2013SK3146)
关键词 数据挖掘 关联规则 FP-GROWTH 大数据 并行计算 SPARK data mining association rule FP-Growth big data parallel computing Spark
  • 相关文献

参考文献2

二级参考文献18

  • 1Jiawei Han,Jian Pei,Yiwen Yin.Mining frequent patterns without candidate generation[J].ACM SIGMOD Record.2000(2)
  • 2TANBEER S K,AHMED C F,JEONG B S.Parallel anddistributed frequent pattern mining in large databases[].th IEEE International Conference on High PerformanceComputing and Communications.2009
  • 3Tu F,He B.A parallel algorithm for mining association rules based on FP-tree[].Advances in computer scienceenvironmentecoinformaticsand education.2011
  • 4S. Xue-Li,L. Tao.Association rules parallel algorithm based on FP-tree[].ProcndInt Computer Engineering and Technology.2010
  • 5Yang X,Liu Z,Fu Y.MapReduce as a programming model for association rules algorithm on Hadoop[].Proceedings of rd International Conference on Information Sciences and Interaction Sciences.2010
  • 6Zhou L,Zhong Z,Chang J,et al.Balanced Parallel FPGrowth with Map-Reduce[].Proceedings of IEEE Youth Comference on Information Computing and Telecommunications (YC-ICT).2010
  • 7Li L,Zhang M.The strategy of mining association rule based on cloud computing[].Proceedings of International Conference on Business Computing and Global Information.2011
  • 8Vu L,Alaghband G.A fast algorithm combining FP-tree and TID-list for frequent pattern mining[].Proceedings of IEEE Conference on Information and Knowledge Engineering.2011
  • 9Agrawal R,Srikant R.Fast algorithms for mining association rules[].Proceedings of the th International Conference on Very Large Data Bases.1994
  • 10Woo Jongwook,Xu Yuhang.Market basket analysis algorithm with Map/Reduce of cloud computing[].Proceedings of TheInternational Confer-ence on Parallel and Distributed Processing Tech-niques and Applications.2011

共引文献53

同被引文献109

引证文献14

二级引证文献75

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部