期刊文献+

一种基于FP-growth的并行SON算法的实现 被引量:2

Implementation of paralleled SON algorithm based on FP-growth
下载PDF
导出
摘要 单节点运行的传统SON算法能够有效降低CPU和I/O负载,而且算法仅需要对整个事务数据集扫描两次。但是在算法执行的阶段一中发现局部频繁项集时采用的Apriori算法仍然需要对每个分区进行多次扫描。在深入研究SON算法的基础上,根据MapReduce编程模型提出了基于FPgrowth的SON算法的并行化实现。实验结果表明,基于FP-growth的并行SON算法不仅降低了传统SON算法的运行时间,并且随着分区数目的增加还能获取比较好的加速比。 Traditional SON algorithm running in the single node can effectively reduce CPU and I/O overhead. And it comes true in only two scans of entire transaction data sets. But every partition needs to be scanned multiple number of times by Apriori algorithm adopted to find frequent items by phrase one of SON. Based on the research of SON algorithm,a paralleled SON algo- rithm adopting FP-growth algorithm was proposed on the basis of MapReduce paradigm. The experiments show that paralleled SON algorithm based on FP-growth runs in less time than it based on Apriori. And it can obtain liner speedup with the growing num- bers of partitions.
出处 《微型机与应用》 2014年第8期60-63,共4页 Microcomputer & Its Applications
关键词 FP-GROWTH SON算法 MAPREDUCE 数据挖掘 FP-growth SON algorithm MapReduce data mining
  • 相关文献

参考文献8

  • 1AGRAWAL R ,SRIKANT R.Fast algorithms for mining association rules[C].Proceedings of 20th Conference on Very Large Data Bases, 1994:487-499.
  • 2Han Jiawei, Pei Jian, Yin Yiwen.Mining frequent patterns without candidate generation[C].Proceedings of Conference on the Management of Data, 2000 : 1-12.
  • 3SAVASERE A, OMIECINSKI E, NAVATHE S.An efficient algorithm for mining association rules in large databases[C]. Proceedings of the 21st Conference on Very Large Database, 1995 : 432-444.
  • 4DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large clusters[J].Communications of the ACM, 2008,51(1): 107-113.
  • 5Apache Hadoop [EB/OL]. [2013-07-12 ]. http://hadoop. apache, org.
  • 6李建江,崔健,王聃,严林,黄义双.MapReduce并行编程模型研究综述[J].电子学报,2011,39(11):2635-2642. 被引量:187
  • 7Apache Mahouts [EB/OL].[2013-08-12 ].http://mahout. apache.org/.
  • 8Frequent Itemset Mining Dataset Repository[EB/OL].[2013- 08-28].http://fimi.ua.ac.be/data.

二级参考文献45

  • 1宁焕生,张瑜,刘芳丽,刘文明,渠慎丰.中国物联网信息服务系统研究[J].电子学报,2006,34(B12):2514-2517. 被引量:151
  • 2J Dean,S Ghemawat.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 3J L Wagener.High performance fortran[J].Computer Standards & Interfaces,Elsevier,1996,18(4):371-377.
  • 4W Gropp,E Lusk,et al.Using MPI:Portable Parallel Programming with the Message Passing Interface[M].Cambridge:MIT Press,1999.1-350.
  • 5A Geist,A Beguelin,et al.PVM:Parallel Virtual Machine:A Users' Guide and Tutorial for Networked Parallel Computing[M].Cambridge:MIT Press,1995.1-299.
  • 6A Verma,N Zea,et al.Breaking the mapreduce stage barrier .Proc of IEEE International Conference on Cluster Computing .Los Alamitos:IEEE Computer Society,2010.235-244.
  • 7H C Yang,A Dasdan,et al.Map-Reduce-Merge:Simplified relational data processing .Proc of ACM SIGMOD International Conference on Management of Data .New York:ACM,2007.1029-1040.
  • 8S V Valvag,D Johansen.Oivos:Simple and efficient distributed data processing .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2008.113-122.
  • 9Z Vrba,P Halvorsen,et al.Kahn process networks are a flexible alternative to mapreduce .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2009.154-162.
  • 10Apache hadoop .http://lucene.apache.org/hadoop/,2010-10-15/2010-12-28.

共引文献186

同被引文献13

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部