期刊文献+

一种基于MapReduce的频繁项集挖掘算法 被引量:1

A Algorithm for Mining Frequent Itemsets Based on MapReduce
下载PDF
导出
摘要 随着大数据时代的到来,针对Apriori算法和FP-Growth算法在挖掘海量规模数据频繁项集时,存在内存不足、计算效率低等问题,提出一种Aggregating_FP算法。该算法结合MapReduce并行计算框架与FP-Growth算法,实现频繁项集的并行挖掘,对每个项进行规约合并处理,仅输出包含该项的前K个频繁项集,提高了海量数据决策价值的有效性。在Hadoop分布式计算平台上对多组规模不同的数据集进行测试。实验结果表明,该算法适合大规模数据的分析和处理,具有较好的可扩展性。 With the arrival of the era of big data ,in view of the Apriori algorithm and FP‐Grow th algorithm in mining large scale frequent itemsets ,there exists some performance bottlenecks such as insufficient memory ,low calculation efficiency and so on .An improved Aggregating_FP algorithm is proposed ,the algorithm combines MapReduce with FP‐Growth algo‐rithm to realize the idea of parallel mining frequent itemsets .And in the output stage ,each item is processed by merging and the algorithm output only the first K frequent itemsets including the item to improve the effectiveness of mass data de‐cision value .M ultiple groups of different scale data sets are tested in Hadoop platform ,experimental results show that the Aggregating_FP algorithm is applied to analyze and deal with big data ,and has good expansibility .
作者 孙兵率
出处 《软件导刊》 2015年第4期75-77,共3页 Software Guide
关键词 频繁项集 可扩展性 MapReduce Hadoop Frequent Itemsets MapReduce Hadoop Scalability
  • 相关文献

参考文献5

  • 1DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[J]. Communications Of The ACM, 2008,51 (1): 107-113.
  • 2APACHE HADOOP. Hadoop[EB/OL]. http://hadoop, apache. org.
  • 3虞倩倩,戴月明,李晶晶.基于MapReduce的ACO-K-means并行聚类算法[J].计算机工程与应用,2013,49(16):117-120. 被引量:13
  • 4曾青华,袁家斌,张云洲.基于Hadoop的贝叶斯过滤MapReduce模型[J].计算机工程,2013,39(11):57-60. 被引量:3
  • 5ZHUOBO RONG. Complex statistical analysis of big data, imple- mentation and application of apriori and FP-Growth algorithm based on MapReduceEC]. Proceedings of 2013 4th IEEE Interna- tional Conference on Software Engineering and Service Science (IC- SESS), Beijing, IEEE, 2013:968-972.

二级参考文献53

  • 1刘靖明,韩丽川,侯立文.基于粒子群的K均值聚类算法[J].系统工程理论与实践,2005,25(6):54-58. 被引量:122
  • 2刘星,毕奇龙,郑付刚.基于蚁群K均值聚类算法的边坡稳定性分析[J].水电能源科学,2010,28(8):108-109. 被引量:5
  • 3云计算[EB/OL].http://en.wikipedia.org/wiki/Cloud_computing.
  • 4李维杰,徐勇.简体中文垃圾邮件分类的实验设计及对比研究[J].计算机工程与应用,2007,43(25):128-132. 被引量:3
  • 5韩家炜,堪博擞据挖掘概念与技术[M].北京:机械工业出版社,2007:5-6.
  • 6Ngazimbi M.Data clustering using mapreduce[D].Idaho: Bosie State University, 2009.
  • 7Dean J, Ghemawat S.MapReduce: simplified data processing on large clusters[J].Communications of the ACM, 2005,51 (1) : 107-113.
  • 8WHITET.Hadoop权威指南[M].北京:清华大学出版社,2011.
  • 9Caruana G, Li Maozhen, Qi Hao. SpamCloud: A MapReduce Based Anti-spam Architecture[C]//Proc. of FSKD’10. Yantai, China: [s. n.], 2010.
  • 10Cormack G. Email Spam Filtering: A Systematic Review[M]// Foundations and Trends in Information Retrieval[S. l.]: Now Publishers Inc., 2008.

共引文献14

同被引文献10

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部