期刊文献+

基于Hadoop平台的C4.5算法的分析与研究 被引量:5

Analysis and Study of C4.5 Algorithm Based on Hadoop Platform
下载PDF
导出
摘要 如何能从海量数据中以更快速、高效、低成本的方式挖掘出有价值的信息成为如今数据挖掘技术面临的新课题。文中在研究Hadoop平台的特征和决策树的C4.5算法的过程中,决定在决策树算法领域中引入云计算思维,实现其在Hadoop平台上的并行化,并且采用MapReduce模型来解决海量数据挖掘问题。最后用打高尔夫球的数据集对新的算法进行验证。实验结果表明对海量数据,基于Hadoop平台的决策树算法可以明显提高数据挖掘的效率,具有可观的高效性和可扩展性,在一定程度上解决了C4.5算法在处理海量数据时计算量大、构建决策树时间长的问题。 How can dig out the valuable information from the vast amount of data in a more rapid,efficient and low-cost way now be-come a new task faced by the data mining technology. In this paper,in the study of the characteristics of the Hadoop platform and the process of decision tree C4. 5 algorithm,decide to introduce the cloud computing thinking to the field of decision tree algorithm,achieve its parallelization on Hadoop platform and use MapReduce model to solve the problem of massive data mining. Finally with using a round of golf data sets to verify this new algorithm,the results of the experiments show that for the huge amounts of data,the decision tree algo-rithm based on Hadoop platform can significantly improve the efficiency of data mining. It has a good efficiency and scalability. In a cer-tain extent,it also solves the problems of computing huge amounts of data and building the decision tree taking long time that C4. 5 algo-rithm faced when dealing with large amount of calculation.
作者 孙媛 黄刚
出处 《计算机技术与发展》 2014年第11期83-86,90,共5页 Computer Technology and Development
基金 国家自然科学基金资助项目(61171053)
关键词 HADOOP MAPREDUCE 数据挖掘 C4.5算法 Hadoop MapReduce data mining C4.5 algorithm
  • 相关文献

参考文献12

二级参考文献61

  • 1魏红宁.基于SPRINT方法的并行决策树分类研究[J].计算机应用,2005,25(1):39-41. 被引量:18
  • 2郭玉滨.一种基于离散度的决策树改进算法[J].山东师范大学学报(自然科学版),2006,21(3):129-131. 被引量:3
  • 3Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
  • 4Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
  • 5Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
  • 6Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
  • 7Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
  • 8Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
  • 9Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
  • 10Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.

共引文献1484

同被引文献45

  • 1崔杰,李陶深,兰红星.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,49(S1):12-18. 被引量:141
  • 2董新华,李瑞轩,周湾湾,王聪,薛正元,廖东杰.Hadoop系统性能优化与功能增强综述[J].计算机研究与发展,2013,50(S2):1-15. 被引量:69
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:120
  • 4Wegener D, Mock W, Adranale D. Toolkit-based high-per- formance data mining of large data on MapReduce clusters [ C ]//IEEE International Conference on Data Mining Work- shops. 2009:296 - 301.
  • 5Tan P N, Steinbach M, Kumar V. Introduction to Data Mining [ M]. 北京:机械工业出版社,2010:89-120.
  • 6Pera M S, Ng Y K. A naive Bayes classifier for Web docu- ment summaries created by using word similarity and signifi- cant factors [ J ]. International Journal on Artificial Intelli- gence Tools,2010,19 (4) :465 - 486.
  • 7Malik H H, Fradkin D, Moerchen F. Single pass text classifi-cation by direct feature weighting [ J ]. Knowledge and Infor- mation Systems,2011,28 ( 1 ) :79 - 98.
  • 8Salton G, Clement T Y. On the construction of effective vo- cabularies for information retrieval [ C ]//Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval. 1973.
  • 9How B C, Narayanan K. An empirical study of feature selec- tion for text categorization based on term weightage [ C ]// Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence. 2004:599 - 602.
  • 10Chu C T, Kim S K, Lin Y A, et al. Map-reduce for machine learning on muhicore [ C ]//Proceedings of Neural Informa- tion Processing Systems Conference. 2006.

引证文献5

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部