期刊文献+

Hadoop平台上Apriori算法并行化研究与实现 被引量:26

Research and Implementation of Parallel Apriori Algorithm on Hadoop Platform
下载PDF
导出
摘要 分析传统串行关联规则Apriori算法的计算过程以及存在的一些缺点,针对串行算法执行效率低,时间复杂度高以及传统并行计算模式不能处理节点失效,难以处理负载均衡等问题,提出基于Hadoop平台实现并行关联规则算法的设计方法,对传统关联规则Apriori算法进行了改进,并给出改进算法在Hadoop平台的MapReduce编程模型上的执行流程;在Hadoop平台上对改进后的算法进行单机测试和集群测试,实验结果证明,改进后的算法具有较高的执行效率,良好的加速比和可移植性。 The traditional association rule Apriori algorithm and its defect are analyzed,on account of the serial algorithm are lower efficiency,high time complexity and the traditional parallel computing can not deal with node failure,it is also difficult to deal with issues such as load balancing,the parallel association rule algorithm based on the Hadoop platform is proposed,the traditional association rule Apriori algorithm has been improved and the implementation process of the improved algorithm based on the MapReduce programming model is given;the improved algorithm is tested on a single computer and clusters,experimental results show that the improved algorithm has a higher efficiency,better speedup and portability.
出处 《计算机与现代化》 2013年第3期1-4,8,共5页 Computer and Modernization
基金 国家自然科学基金资助项目(61163025) 内蒙古自然科学基金资助项目(2012MS0912) 教育部春晖计划项目(Z2009-1-01044)
关键词 HADOOP 关联规则算法 并行计算 APRIORI Hadoop association rule algorithm parallel computing Apriori
  • 相关文献

参考文献8

  • 1JiaweiHart,MichelineKamber.数据挖掘概念与技术[M].范明,孟小峰,等译.北京:机械工业出版社,2007:184-212.
  • 2王润华.基于Hadoop集群的分布式日志分析系统研究[J].科技信息,2009(15):60-60. 被引量:9
  • 3Dean J, Ghemawat S. MapReduce: Simplified data pro- cessing on larger clusters [ J ]. Communications of the ACM, 2005,51(1) :107-113.
  • 4Bhandarkar M. MapReduce programming with apache Ha- doop[ C ]//2010 IEEE International Symposium on Paral- lel & Distributed Processing. 2010:1.
  • 5TomWhite.Hadoop权威指南[M].周敏奇,王晓玲,译.北京:清华大学出版社,2011.
  • 6Yang Lai, Shi ZhongZhi. An efficient data ming framework on Hadoop using Java persistence API [ C]// 2010 10th IEEE International Conference on Computer and Informa- tion Technoogy. 2010:203-209.
  • 7Wegener D, Mock M, Adranale D, et al. Toolkits basedhigh-performance data mining of large data on MapReduce clusters [ C ]//IEEE International Conference on Data Min- ing Workshops. 2009:296-301.
  • 8江小平,李成华,向文,张新访.云计算环境下朴素贝叶斯文本分类算法的实现[J].计算机应用,2011,31(9):2551-2554. 被引量:21

二级参考文献14

  • 1DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [ J] // Communications of the ACM: 50th anniversary issue, 2008, 51(1): 107-113.
  • 2Apache Hadoop. Hadoop[ EB/OL]. [2011-03- 15]. http://hadoop. apache, org.
  • 3CHU C-T, KIM S K, LIN Y-A, et al. Map-reduce for machine learning on multicore[ C]// NIPS 2006: Proceedings of Neural Information Processing Systems Conference. Cambridge, MA: MIT, 2006:281-288.
  • 4JASON D, LAWRENCE S, JAIME T, et al. Tracking the poor assumptions of Naive Bayes text classifiers[ C]// ICML 2003: Proceedings of the Twenty International Conference on Machine Learning. Washington, DC: [s. n. ], 2003:616-693.
  • 5中国科学院计算技术研究所.ICTCLAS汉语分词系统【EB/OL】.[2011-02—16】.http://ictclas.org/.
  • 6University of Waikato. Weka 3: data mining software in Java [ EB/ OL]. [2011 -03 - 15]. http://www, cs. waikato, ac. nz/ml/weka/.
  • 7WEGENER D, MOCK M, ADRANALE D, et al. Toolkit-based high-performance data mining of large data on MapReduce clusters [ C]// ICDM: IEEE International Conference on Data Mining. Washington, DC: IEEE Computer Society, 2009:296 -301.
  • 8MIT Computer Science and Artificial Intelligence Laboratory. Twenty news groups dataset[ EB/OL]. (2008 -01 - 14) [2011 -02 - 18]. http://people, csail, mit. edu/jrennie/20Newsgroups/.
  • 9搜狗实验室.互联网语料库【EB/OL】.【2011—02—17].www.sogou.com/labs/dl/t.html.
  • 10陈慧萍,林莉莉,王建东,苗新蕊.WEKA数据挖掘平台及其二次开发[J].计算机工程与应用,2008,44(19):76-79. 被引量:35

共引文献41

同被引文献260

引证文献26

二级引证文献246

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部