期刊文献+

云计算环境下关联规则挖掘算法的研究 被引量:48

Research on Algorithms of Mining Association Rule under Cloud Computing Environment
下载PDF
导出
摘要 云计算为存储和分析海量数据提供了廉价高效的解决方案,云计算环境下的数据挖掘算法的研究具有重要的理论意义和应用价值。针对云计算环境下的关联规则挖掘算法展开研究,介绍了云计算的概念、Hadoop框架平台、MapReduce编程模型和传统的Apriori算法;在此基础上,以实现云计算环境下的并行化数据挖掘为目的,对Apriori算法进行了改进,给出了改进的算法在Hadoop中的MapReduce编程模型上的执行流程;通过一个简单的频繁项集挖掘实例展示了改进的算法的执行效率及实用性。 Cloud computing provides cheap and efficient solutions of storing and analyzing mass data.It is very important to research the data mining algorithms based on cloud computing from the theoretical view and practical view.In this paper,the algorithms of mining association rules based on cloud computing environment are focused on;First,cloud computing,Hadoop,MapReduce programming model and Apriori algorithm are introduced;Secondly,an improved Apriori algorithm as well as the procedure of the improved Apriori algorithm on MapReduce is designed in order to realize parallel data mining.Finally,a simple example of mining the frequent items is given to reflect the efficiency and utility of the improved algorithm.
作者 李玲娟 张敏
出处 《计算机技术与发展》 2011年第2期43-46,50,共5页 Computer Technology and Development
基金 国家重点基础研究发展计划(973计划)资助项目(2011CB302903) 国家自然科学基金(60863001)
关键词 云计算 数据挖掘 APRIORI MAPREDUCE cloud computing data mining Apriori MapReduce
  • 相关文献

参考文献9

  • 1Weiss A. Computing in Clouds[ J]. ACM Networker,2007,11 (4) : 18-25.
  • 2Buyya R, Yeo C S, Venugopal S. Market-Oriented Cloud Computing : Vision, Hype, and Reality for Delivering IT Services as Computing Utilities[ C ]//Proceedings of the 2008 10^th IEEE International Conference on High Performance Computing and Communications. [ s. l. ] : [ s. n. ] ,2008 : 5-13.
  • 3Apache. Hadoop [ EB/OL]. 2006. http://lucene, apache. org/hadoop/.
  • 4Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters [ C ]//Proceedings of the 6th Symposium on Operating System Design and Implementation. San Francisco, California, USA : USENIX Association, 2004 : 137-150.
  • 5Wu X, Kumar V, Ghosh R J, et al. Top 10 algorithms in data mining[J]. Knowledge and Information Systems,2008,14 (1) :1-37.
  • 6刘华元,袁琴琴,王保保.并行数据挖掘算法综述[J].电子科技,2006,19(1):65-68. 被引量:15
  • 7Agrawal R, Sharer J C. Parallel Mining of Association Rules [ J]. IEEE Transactions on Knowledge and Data Engineering, 1996,8 ( 6 ) : 962- 969.
  • 8王鄂,李铭.云计算下的海量数据挖掘研究[J].现代计算机,2009,15(11):22-25. 被引量:26
  • 9Aflori C, Craus M. Grid implementation of the Aprioti algorithm[ J]. Engineering Software,2007, 38( 5): 295-300.

二级参考文献6

  • 1魏红宁.基于SPRINT方法的并行决策树分类研究[J].计算机应用,2005,25(1):39-41. 被引量:18
  • 2Michael Miller姜进磊,孙瑞志,向勇等译.云计算[M].北京:机械出版社.2009.
  • 3Jeffrey Dean, Sanjay Ghemawat. MapReduce: Symplified Date Processing on Large Clusters[J]. New York:ACM,2008, 51(1):107-113.
  • 4韩家炜,坎伯.数据挖掘概念与技术[M].北京:机械工业出版社.2008.
  • 5John Shafer, Rakesh Agrawal,Manish Mehta. SPRINT:A Scalable Parallel Classifier for Data Mining [C].U.S:IBM Almaden Research Center,1996:544-555.
  • 6于蕾,刘大有,高滢,田野.改进SPRINT算法及其在分布式环境下的研究[J].吉林大学学报(理学版),2008,46(6):1119-1124. 被引量:5

共引文献38

同被引文献404

引证文献48

二级引证文献430

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部