期刊文献+

基于MapReduce的K_means并行算法及改进 被引量:4

Parallel K-Means Algorithm and Improved Based on Map Reduce
下载PDF
导出
摘要 针对传统k_means聚类算法在处理海量数据时所面临的内存不足、运算速度慢等问题,提出了一种基于Map Reduce的K_means并行算法,同时为了改善k_means算法在初始值确定方面的盲目性,采用canopy算法进行改进.实验结果表明,基于Map Reduce的K_means并行算法和改进后的算法均能产生良好的聚类效果,不仅提高了聚类质量,而且在处理大数据集方面,改进后的算法的还能够得到趋近于线性的加速比. In view of the problems that traditional k-means clustering algorithm faces in dealing with mass data, such as running out of memory, the operating in slow speed and so on, this paper proposes a parallel k-means algorithm based on MapReduce. At the same time, in order to overcome the blindness of the k-means algorithm in terms of determining the initial value, we use the canopy algorithm to improve the insufficient. The experimental results show that the parallel k-means algorithm based on MapReduce has an effect on clustering before and after the improvement, not only the quality of the clustering has been increased, but in terms of processing large datasets. The speed-up ratio of the improved algorithm can get closer to the linear.
作者 衣治安 王月
出处 《计算机系统应用》 2015年第6期188-192,共5页 Computer Systems & Applications
关键词 MAP REDUCE K-MEANS算法 canopy算法 并行计算 聚类 MapReduce k-means algorithm canopy algorithm parallel computation cluster
  • 相关文献

参考文献9

  • 1Huang WQ, Chen M. Note on: An improved algorithm for the packing of unequal circles within a larger containing circle. Computers & Industrial Engineering, 2006, 50(3): 338-344.
  • 2Dean J, Ghemawat S. MapReduce: Simplified Data processing on Large Clusters. Communications of the ACM, 2008, 51(1): 107-113.
  • 3周锋,李旭伟.一种改进的MapReduce并行编程模型[J].科协论坛(下半月),2009(2):65-66. 被引量:14
  • 4Wegener D. Mock M. Adranale D. et al. Toolkit based high-performance data mining of large data on MapReduce clusters. IEEE International Conference on Data Mining ICDM. Washington. IEEE. 2009. 296-301.
  • 5吴夙慧,成颖,郑彦宁,潘云涛.K-means算法研究综述[J].现代图书情报技术,2011(5):28-35. 被引量:161
  • 6郑启龙,房明,汪胜,王向前,吴晓伟,王昊.基于MapReduce模型的并行科学计算[J].微电子学与计算机,2009,26(8):13-17. 被引量:39
  • 7Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Operating System Design and Implementa- tion, 2004: 137-149.
  • 8李成华,张新访,金海,向文.MapReduce:新型的分布式并行计算编程模型[J].计算机工程与科学,2011,33(3):129-135. 被引量:111
  • 9Kruijf MD, Sankaralingam K. MapReduce for the cell broadband engine architecture. IBM Journal of Research and Development, 2009, 53(5): 747-758.

二级参考文献72

共引文献318

同被引文献25

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部