期刊文献+

基于分位数半径的动态K-means算法 被引量:5

Dynamic K-means algorithm based on quantile radius
下载PDF
导出
摘要 K-means算法是应用最广泛的聚类算法之一,但存在明显缺陷:对初始值敏感,还需给定类的数目.层次K-means算法提出将多次k取固定值的K-means运算所得到的中心点作为类的代表,并通过对这些中心点进行层次聚类来得到更好的初始聚类中心,然而在中心的融合过程中并没有有效利用类的几何信息.从类的几何特征入手,提出一种基于类的分位数半径的动态K-means算法(QRD K-means).此算法在层次K-means的基础上令每次K-means运算的k值变动起来,且又引入了分位数半径的概念,用样本点到类中心距离的分位数作为类的半径,将样本点间的关系简化为各个类的分位数半径与类中心的关系.通过中心点间距离与分位数半径大小的比较对中心点进行融合形成新类,从而快速给出良好的聚类结果,同时也确定了类的数目.在仿真实验中,通过与不同算法在时间和分类精确度上的比较分析,也证明该方法快速有效. There are obvious deficiencies in the well-known K-means algorithm:one is that it is sensitive to initial values,and the other is that the number of clusters needs to be given artificially.Many related works have improved the performance of K-means,but most of them only focus on single aspect rather than both on speed and identification of the number of clusters.The Hierarchical K-means algorithm(H K-means) was proposed to generate a center set C by carrying out K-means repeatedly with a constant k.After that,it generates the initial cluster centers of K-means by executing Hierarchical clustering algorithm on the center set C.This paper focuses on the geometric features of clusters,proposing a dynamic K-means algorithm based on quantile radius(QRD K-means).This algorithm is on the basis of Hierarchical K-means and repeats K-means with varying k values.After executing K-means multiple times,a series of cluster centers are obtained to be stored in set C.Moreover,QRD K-means additionally a new concept called quantile radius,which is a certain quantile of distance between the center and the sample points which belong to the corresponding cluster.Then it merges the centers in set C into a new cluster by comparing the pair-wise distance of centers with the sum of the corresponding quantile radiuses.In this way,a clustering result of centers can be obtained adaptively.At last,QRD K-means uses the centers of these new clusters as the initial centers of K-means to get the finial clustering result.The change in k value makes the center set C cover more areas than the previous Hierarchical K-means and can lead to more representative initial cluster centers.The way to get initial cluster centers by means of quantile radius is also faster.In the simulation experiments,QRD K-means was compared with Hierarchical K-means,DP-means and X-means both on the time consumption and accuracy.Results show that QRD K-means method is faster and more efficient.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第1期48-55,共8页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(11471264 11401148 11571282 51437003)
关键词 K-MEANS 类的数目 分位数半径 动态K-means K-means number of clusters quantile radius dynamic K-means
  • 相关文献

参考文献2

二级参考文献14

  • 1李瑞,邱玉辉.基于离散点的蚁群聚类算法的研究[J].计算机科学,2005,32(6):111-113. 被引量:4
  • 2田铮,李小斌,句彦伟.谱聚类的扰动分析[J].中国科学(E辑),2007,37(4):527-543. 被引量:33
  • 3王洪春,彭宏.基于模糊C-均值的增量式聚类算法[J].微电子学与计算机,2007,24(6):156-157. 被引量:22
  • 4黄光球,王西邓,刘冠.基于网格划分策略的改进人工鱼群算法[J].微电子学与计算机,2007,24(7):83-86. 被引量:18
  • 5Han J W, Kamber M. Data mining concepts and techniques[ M].北京:高等教育出版社,2002:335-394.
  • 6Bradley P S, Fayyad U M. Refining initial points for K- Means clustering [ C ]// Proc. of the 15th International Conf. on Machine Learning. San Franciseo, CA: Morgan Kaufmann, 1998: 91 - 99.
  • 7Mob' d B Al- Daoud, Stuart A Roberts. New methods for the initialization of clusters[J]. Pattern Recognition Letters, 2001(17) :451 - 455.
  • 8Kaufman L, Rousseeuw P J. Finding groups in data:an introduction to cluster analysis[M]. NY:John Wiley&Sons, 1990.
  • 9Moh' d B,Al - Daoud,Stuart A Roberts. New methods for the initialization of clusters[J]. Pattern Recognition Letters,2002(17) :451 - 455.
  • 10He J, Lan M, Tan C L, et al. Initialization of cluster refinement algorithms: a review and comparative study [ C] // Proceedings of 2004 IEEE International Joint Conference on Neural Networks (IJCNN). Singapore, 2004 ; 279 - 302.

共引文献109

同被引文献19

引证文献5

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部