期刊文献+

基于Spark的并行FP-Growth算法优化与实现 被引量:8

OPTIMIZATION AND IMPLEMENTATION OF PARALLEL FP-GROWTH ALGORITHM BASED ON SPARK
下载PDF
导出
摘要 频繁模式挖掘作为模式识别的重要问题,一直受到研究者的广泛关注。FP-Growth算法因其高效快速的特点,被大量应用于频繁模式的挖掘任务中。然而,该算法依赖于内存运行的特性,使其难以适应大规模数据计算。针对上述问题,围绕大规模数据集下频繁模式挖掘展开研究,基于Spark框架,通过对支持度计数和分组过程的优化改进了FP-Growth算法,并实现了算法的分布式计算和计算资源的动态分配。运算过程中产生的中间结果均保存在内存中,因此有效减少数据的I/O消耗,提高算法的运行效率。实验结果表明,经优化后的算法在面向大规模数据时要优于传统的FP-Growth算法。 As an important problem of pattern recognition,frequent itemsets mining has been paid more and more attention by researchers. FP-Growth algorithm is widely used in frequent pattern mining because of its high efficiency and fast performance. However,the algorithm relies on the characteristics of local memory operation,making it difficult to adapt to large-scale data calculation. To solve these problems,this paper focuses on the research of frequent itemsets mining in a distributed environment. The FP-Growth algorithm which based on the Spark framework was improved by optimizing the support count and grouping process. Furthermore,the distributed computation and the dynamic allocation of computing resources were realized. The intermediate results were stored in the memory,so the I/O consumption was reduced and the efficiency of the algorithm was improved. The experimental results show that the improved distributed FP-Growth algorithm is superior to the traditional FP-Growth algorithm for large-scale data.
出处 《计算机应用与软件》 2017年第9期273-278,共6页 Computer Applications and Software
基金 国家自然科学基金项目(71371013) 安徽工业大学校青年教师科研基金项目(QZ201420) 安徽省教育厅自然科学基金项目(KJ2016A087)
关键词 频繁模式挖掘 FP-GROWTH算法 分布式计算 Spark框架 Frequent pattern mining FP-Growth algorithm Distributed computing Spark framework
  • 相关文献

参考文献9

二级参考文献68

  • 1万仁霞,陈瑞典.一种改进的Apriori算法[J].福州大学学报(自然科学版),2005,33(2):282-284. 被引量:4
  • 2邹翔,张巍,刘洋,蔡庆生.分布式序列模式发现算法的研究[J].软件学报,2005,16(7):1262-1269. 被引量:19
  • 3邓传军,马志民.分布式计算模型探讨[D].福建:厦门大学,2005.
  • 4奚丽倩,袁国良.浅析中间件技术的研究现状[D].上海:上海海事大学,2009.
  • 5于涛,张继棠,雷飞鹏.Mobile Agent技术应用[D].重庆:重庆邮电大学,2007.
  • 6周文莉,吴晓非.P2P技术综述[D].北京:北京邮电大学,2006.
  • 7杨涛,刘金德.web service技术综述--一种面向服务的分布式计算模式[D].四川:电子科技大学,2004.
  • 8邓倩妮,陈全.云计算及其关键技术[D].上海:上海交通大学,2009.
  • 9Inmon W H. Building the data warehouse [ M ]. America : Wiley,2005.
  • 10Gaber M M, Yu P S. A framework for resource- aware knowledge discovery in data streams: A holistic approach with its application [ C ] // Proceedings of the ACM symposium on Applied computing. Dijon, France : ACM Press, 2006 : 649 - 656.

共引文献124

同被引文献73

引证文献8

二级引证文献70

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部