期刊文献+

基于Hadoop平台上面向电影数据集Kmeans算法的改进 被引量:2

The Improvement of Kmeans Algorithm Facing the Movie Dataset Based on Hadoop Platform
下载PDF
导出
摘要 针对聚类算法并行化的需求,该文对基于Hadoop平台Kmeans算法进行了改进,选用Canopy算法对数据进行预处理,并在具有一定数据结构的电影数据集上进行了单机对比实验,集群加速比实验和集群扩展率实验,分别体现改进后算法实现的高效性、良好的加速比和可扩展性,从而可以有效地运用在实际海量数据挖掘中. According to parallelism demand of the clustering algorithm, This paper improved the implemention of the kmeans algorithm based on the Hadoop platform. We do the preprocess on the dataset using the canopy algorithm, and conduct the single contrast experiment, cluster speed up experiment and cluster expansion rate experiment, showing the high effiency, better speed up and scalability, thus the implemention can be used in the pratical mass data mining effectively.
机构地区 天津师范大学
出处 《哈尔滨师范大学自然科学学报》 CAS 2012年第1期32-36,共5页 Natural Science Journal of Harbin Normal University
基金 国家自然科学基金项目(60970060) 天津市教委资助项目(20071328) 天津市科技支撑计划重点项目(09ZCKFGX00500) 天津师大博士基金项目资助(52LX17)
关键词 HADOOP MAP REDUCE Kmeans Hadoop Map Reduce Kmeans
  • 相关文献

参考文献8

  • 1Amol Ghoting, Prabhanjan Kambadur, Edwin Pednault, and Ramakrishnan Kannan,et al. NIBLE:A TOOLkit for the Implementation of Paralle Data Mining and Machine Learning Algorithm on MapReduce [ C ] KDD2011, August 21 - 24, 2011, San Diego, California, USA. 334-342.
  • 2Likewin Thomas, B. Annappa, Application of Parallel K -Means Clustering Algorithm for Prediction of Optimal Path in Self Aware Mobile Ad - Hoc Networks with Link Stability[J] Communications in Computer and Information Science, 2011, Volume 193, Part 4,396 -405.
  • 3李成华,张新访,金海,向文.MapReduce:新型的分布式并行计算编程模型[J].计算机工程与科学,2011,33(3):129-135. 被引量:111
  • 4Parallel K- Means Clustering Based on MapReduce [ J ] Lecture Notes in Computer Science, 2009, Volume 5931/2009, 674 - 679.
  • 5Wikipedia, k - means clustering [ EB/OL ]. http ://en. wikipedia. org/wiki/k - means_clustering.
  • 6Hadoop技术论坛[EB/OL].http://www, bbs. hadoopor.com.
  • 7Apache. Welcome to Apache Hadoop [ EB/OL]. 2011. http ://hadoop. apache, org.
  • 8[美]怀特.Hadoop权威指南[M].周傲英,曾大聃,译.北京:清华大学出版社,2010.

二级参考文献8

共引文献111

同被引文献28

引证文献2

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部