摘要
云计算为存储和分析海量数据提供了廉价高效的解决方案,云计算环境下的数据挖掘算法的研究具有重要的理论意义和应用价值。针对云计算环境下的关联规则挖掘算法展开研究,介绍了云计算的概念、Hadoop框架平台、MapReduce编程模型和传统的Apriori算法;在此基础上,以实现云计算环境下的并行化数据挖掘为目的,对Apriori算法进行了改进,给出了改进的算法在Hadoop中的MapReduce编程模型上的执行流程;通过一个简单的频繁项集挖掘实例展示了改进的算法的执行效率及实用性。
Cloud computing provides cheap and efficient solutions of storing and analyzing mass data.It is very important to research the data mining algorithms based on cloud computing from the theoretical view and practical view.In this paper,the algorithms of mining association rules based on cloud computing environment are focused on;First,cloud computing,Hadoop,MapReduce programming model and Apriori algorithm are introduced;Secondly,an improved Apriori algorithm as well as the procedure of the improved Apriori algorithm on MapReduce is designed in order to realize parallel data mining.Finally,a simple example of mining the frequent items is given to reflect the efficiency and utility of the improved algorithm.
出处
《计算机技术与发展》
2011年第2期43-46,50,共5页
Computer Technology and Development
基金
国家重点基础研究发展计划(973计划)资助项目(2011CB302903)
国家自然科学基金(60863001)