摘要
分析传统串行关联规则Apriori算法的计算过程以及存在的一些缺点,针对串行算法执行效率低,时间复杂度高以及传统并行计算模式不能处理节点失效,难以处理负载均衡等问题,提出基于Hadoop平台实现并行关联规则算法的设计方法,对传统关联规则Apriori算法进行了改进,并给出改进算法在Hadoop平台的MapReduce编程模型上的执行流程;在Hadoop平台上对改进后的算法进行单机测试和集群测试,实验结果证明,改进后的算法具有较高的执行效率,良好的加速比和可移植性。
The traditional association rule Apriori algorithm and its defect are analyzed,on account of the serial algorithm are lower efficiency,high time complexity and the traditional parallel computing can not deal with node failure,it is also difficult to deal with issues such as load balancing,the parallel association rule algorithm based on the Hadoop platform is proposed,the traditional association rule Apriori algorithm has been improved and the implementation process of the improved algorithm based on the MapReduce programming model is given;the improved algorithm is tested on a single computer and clusters,experimental results show that the improved algorithm has a higher efficiency,better speedup and portability.
出处
《计算机与现代化》
2013年第3期1-4,8,共5页
Computer and Modernization
基金
国家自然科学基金资助项目(61163025)
内蒙古自然科学基金资助项目(2012MS0912)
教育部春晖计划项目(Z2009-1-01044)