期刊文献+

面向海量数据的K-means聚类优化算法 被引量:13

Optimized K-means clustering algorithm for massive data
下载PDF
导出
摘要 针对集中式系统框架难以进行海量数据聚类分析的问题,提出基于MapReduce的K-means聚类优化算法。该算法运用MapReduce并行编程框架,引入Canopy聚类,优化K-means算法初始中心的选取,改进迭代过程中通信和计算模式。实验结果表明该算法能够有效地改善聚类质量,具有较高的执行效率以及优良的扩展性,适合用于海量数据的聚类分析。 In order to solve the problem of the clustering on massive data under the framework of a centralized system, an optimized algorithm to K-means clustering based on MapReduce is proposed. By using MapReduce parallel programming framework and importing Canopy clustering, this algorithm optimizes initial clustering center, improves communication mode and calculation mode in iteration. The experimental results show that this algorithm can effectively improve the quality of clustering, and can have higher implementation efficiency, its good scalability, thus it fits to clustering analysis on massive data.
出处 《计算机工程与应用》 CSCD 2014年第14期143-147,共5页 Computer Engineering and Applications
基金 国家自然科学基金(No.60873100) 山西省自然科学基金(No.2010011022-1) 山西省科技基础条件平台建设项目(No.2011091001-0101)
关键词 海量数据 聚类 MAPREDUCE K-MEANS算法 Canopy算法 massive data clustering MapReduce K-means algorithm Canopy algorithm
  • 相关文献

参考文献3

二级参考文献20

  • 1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:114
  • 2Wikipedia. K-Means clustering [EB/OL]. http://en, wikipedia. org/wiki/K-Means.
  • 3Kantabutra S, Couch A L Parallel K-Means Clustering Algo- rithm on NOWS[J]. Technical Journal, 2000,6 (1) : 243-247.
  • 4Forman G, Zhang B. Distributed Data Clustering can be Efficient and Exact[J]. SIGKDD Explorations, 2000,2 (2) : 34-38.
  • 5Boutsinas B, Gnardellis T. On Distributing the Clustering Pro- cess[J]. Patter Recognition Letters, 2002,23(4) : 999-1008.
  • 6梁红 李伟生.XML文档的并行聚类算法.计算机科学,2004,31(10):243-245.
  • 7Quinn M J. ParM: Pallel Programming in C with MPI and OpenMP[S]. Beijing: Tsinghua University Press, 2005.
  • 8Han J,Kamber M. Data Mining: Concepts and Techniques. High Education Press, Morgan Kaufman Publishers, 223-257
  • 9Guha U,Rastogi R,Shim K. CURE: an efficient clustering algorithm for large databases. Information System, 2001,26 (1): 35-58
  • 10Zhang T, Ramarkrishnan R,Livny M. BIRCH: an efficient data clustering method for very large database. 1996 ACM 0-89791-794-4/96/0006

共引文献1094

同被引文献80

  • 1张石磊,武装.一种基于Hadoop云计算平台的聚类算法优化的研究[J].计算机科学,2012,39(S2):115-118. 被引量:29
  • 2江小平,李成华,向文,张新访,颜海涛.k-means聚类算法的MapReduce并行化实现[J].华中科技大学学报(自然科学版),2011,39(S1):120-124. 被引量:79
  • 3胡建军,唐常杰,李川,彭京,元昌安,陈安龙,蒋永光.基于最近邻优先的高效聚类算法[J].四川大学学报(工程科学版),2004,36(6):93-99. 被引量:24
  • 4王鑫,王洪国,张建喜,谷建军.基于数据分区的最近邻优先聚类算法[J].计算机科学,2005,32(12):188-190. 被引量:4
  • 5杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:190
  • 6Suresh L, Jay B Simha, Rajappa Velur. Seeding cluster cen- ters of K-means clustering through median projection [C] // International Conference on Complex, Intelligent and Software Intensive Systems, IEEE, 2010.. 15-18.
  • 7Siti Noraini Sulaiman, Khairul Azman Abroad, Nor Ashidi Mat Isa, et at. Performance of hybrid radial basis function net- work: Adaptive fuzzy K-means versus moving K-means cluste- ring as centre positioning algorithms on cervical cell pre-cance- rous stage classification [C] //IEEE International Conference on Control System, Computing and Engineering, 1EEE, 2012: 607-611.
  • 8Pahala Sirait, Aniati Murni Arymurthy. Cluster centres deter- ruination based on KI) tree in K-means clustering for area change detection [C] //International Conference on Distributed Frame- works for Multimedia Applications, IEEE, 2010.. 1-7.
  • 9Zhang Yuhua, Wang Kun, Lu Heng, et al. An improved K- means clustering algorithm over data accumulation in delay to- lerant mobile sensor network [C] //8th International Confe- rence on Communications and Networking in China, IEEE, 2013: 34-39.
  • 10Jiang Dongyang, Zheng Wei, Lin Xiaoqing. Research on selec- tion of initial center points based on improved K-means algo- rithm [ C] //2nd International Conference on Computer Science and Network Technology, IEEE, 2012: 1146-1149.

引证文献13

二级引证文献76

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部