摘要
随着大数据时代的到来,针对Apriori算法和FP-Growth算法在挖掘海量规模数据频繁项集时,存在内存不足、计算效率低等问题,提出一种Aggregating_FP算法。该算法结合MapReduce并行计算框架与FP-Growth算法,实现频繁项集的并行挖掘,对每个项进行规约合并处理,仅输出包含该项的前K个频繁项集,提高了海量数据决策价值的有效性。在Hadoop分布式计算平台上对多组规模不同的数据集进行测试。实验结果表明,该算法适合大规模数据的分析和处理,具有较好的可扩展性。
With the arrival of the era of big data ,in view of the Apriori algorithm and FP‐Grow th algorithm in mining large scale frequent itemsets ,there exists some performance bottlenecks such as insufficient memory ,low calculation efficiency and so on .An improved Aggregating_FP algorithm is proposed ,the algorithm combines MapReduce with FP‐Growth algo‐rithm to realize the idea of parallel mining frequent itemsets .And in the output stage ,each item is processed by merging and the algorithm output only the first K frequent itemsets including the item to improve the effectiveness of mass data de‐cision value .M ultiple groups of different scale data sets are tested in Hadoop platform ,experimental results show that the Aggregating_FP algorithm is applied to analyze and deal with big data ,and has good expansibility .
出处
《软件导刊》
2015年第4期75-77,共3页
Software Guide
关键词
频繁项集
可扩展性
MapReduce
Hadoop
Frequent Itemsets
MapReduce
Hadoop
Scalability