期刊文献+

KBAC:一种基于K-means的自适应聚类 被引量:6

KBAC:K-means Based Adaptive Clustering for Massive Dataset
下载PDF
导出
摘要 K-means聚类算法存在的主要不足之处之一在于需要用户指定聚类核数目,在一般应用场景下,用户无法给出合适的聚类核数目.另一方面,K-means聚类所具有的可并行化特点非常适合运用到云计算平台上以处理大规模数据样本的聚类任务.本文提出KBAC算法采用K-means算法作为预聚类过程并在云平台上进行实现和优化,能够自适应确定最佳聚类核数目并进行聚类.其核心思想是将样本空间聚类问题转换为图上社团发现问题.理论和实验证明,通过在云计算框架下实现K-means预聚类过程的并行化,KBAC算法能够高效地对大规模数据进行聚类,并获得高质量的聚类结果. One of the main drawbacks of K-means clustering algorithm is that the number of clusters should be specified by users.In most of the real application scenarios,it is impossible for the user to provide the number of clusters beforehand.On the other hand,its potential parallelizability provides a way to cluster massive dataset efficiently.In this paper,we proposed KBAC algorithm which adopted K-means algorithm as pre-clustering procedure to cluster massive data adaptively under MapReduce cloud framework.The main idea of the algorithm is to reduce the problem of clustering on vector space to community detection problem on graph.Theoretical and experimental results indicated that KBAC algorithm could enhance the clustering quality and efficiency under cloud.
出处 《小型微型计算机系统》 CSCD 北大核心 2012年第10期2268-2272,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61003001 71071098)资助 高等学校博士学科点专项科研基金项目(20100071120032)资助
关键词 K-MEANS MAPREDUCE 聚类 社团发现 K-means MapReduce clustering community detection
  • 相关文献

参考文献9

二级参考文献55

  • 1刘婷,郭海湘,诸克军,高思维.一种改进的遗传k-means聚类算法[J].数学的实践与认识,2007,37(8):104-111. 被引量:22
  • 2刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 3杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:187
  • 4潘伟,刁华宗,井元伟.一种改进的实数自适应遗传算法[J].控制与决策,2006,21(7):792-795. 被引量:53
  • 5Murthy CA,Chowdhury N.In search of optimal clusters using genetic algorithms.Pattern Recognition Letter,1996,17(8):825-832.
  • 6Sanghamitra Bandyopadhyay,Ujjwal Maulik.An evolutionary technique based on K-Means algorithm for optimal clustering.Information Sciences,2002,146(4):221-237.
  • 7Han J W, Kamber M. Data mining: concepts and techniques [M]. San Francisco, US: Morgan Kaufmann, 2001.
  • 8Buyya R, Yeo C S, Venugopal S. Market-oriented cloud computing: vision,hype, and reality for delivering IT services as computing utilities, Keynote Paper [C] // Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications. Dalian, China, 2009 :25-27.
  • 9Armbrust M, Fox A. Above the clouds: a Berkeley view of cloud computing[R]. USA: University of California at Berkeley, 2009.
  • 10Erdogmus H. Cloud computing., does nirvana hide behind the nebula[J]. IEEE Software, 2009,26 (2) : 4-6.

共引文献333

同被引文献74

引证文献6

二级引证文献88

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部