摘要
单节点运行的传统SON算法能够有效降低CPU和I/O负载,而且算法仅需要对整个事务数据集扫描两次。但是在算法执行的阶段一中发现局部频繁项集时采用的Apriori算法仍然需要对每个分区进行多次扫描。在深入研究SON算法的基础上,根据MapReduce编程模型提出了基于FPgrowth的SON算法的并行化实现。实验结果表明,基于FP-growth的并行SON算法不仅降低了传统SON算法的运行时间,并且随着分区数目的增加还能获取比较好的加速比。
Traditional SON algorithm running in the single node can effectively reduce CPU and I/O overhead. And it comes true in only two scans of entire transaction data sets. But every partition needs to be scanned multiple number of times by Apriori algorithm adopted to find frequent items by phrase one of SON. Based on the research of SON algorithm,a paralleled SON algo- rithm adopting FP-growth algorithm was proposed on the basis of MapReduce paradigm. The experiments show that paralleled SON algorithm based on FP-growth runs in less time than it based on Apriori. And it can obtain liner speedup with the growing num- bers of partitions.
出处
《微型机与应用》
2014年第8期60-63,共4页
Microcomputer & Its Applications