期刊文献+

基于云计算的并行k-means算法研究

Research on parallel k-means algorithm based on cloud computing
下载PDF
导出
摘要 针对传统k-means聚类算法面对海量数据存在时间复杂度急剧增加的问题,结合云计算的优势,提出基于MapReduce编程框架来实现k-means聚类算法的并行化处理。Map函数完成每个样本记录到聚类中心的距离计算并标记其所属聚类类别,Reduce函数汇总中间结果并计算出新的聚类中心,供下一轮迭代使用。通过实验表明:基于MapReduce的并行化k-means聚类算法具有较好的加速比和良好的扩展性。 For the problem of high time complexity in dealing with huge data of k-means algorithm,propose a parallel method using MapReduce programming model and cloud computing to reduce the time complexity of k-means. The distance between each record and each cluster was calculated and new category was marked to each record in the Map function.All the records of the same key value were sent to a single reducer and get the new cluster centroids for the next MapReduce Job.Experimental results show that the parallel k-means algorithm based on MapReduce has basically linear speedup with an increasing number of node computers and good scalability.
出处 《齐齐哈尔大学学报(自然科学版)》 2014年第5期5-9,共5页 Journal of Qiqihar University(Natural Science Edition)
基金 福建省教育厅基金项目(JB12312)
关键词 云计算 数据挖掘 并行k-means MAPREDUCE cloud computing data mining parallel k-means MapReduce
  • 相关文献

参考文献2

  • 1Anchalia P P,Koundinya A K,Srinath N K.MapReduce Design of K-Means Clustering Algorithm. Information Science and Applications (ICISA),2013 International Conference on . 2013
  • 2Lam C.Hadoop in Action. . 2011

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部