摘要
K-means算法是应用最广泛的聚类算法之一,但存在明显缺陷:对初始值敏感,还需给定类的数目.层次K-means算法提出将多次k取固定值的K-means运算所得到的中心点作为类的代表,并通过对这些中心点进行层次聚类来得到更好的初始聚类中心,然而在中心的融合过程中并没有有效利用类的几何信息.从类的几何特征入手,提出一种基于类的分位数半径的动态K-means算法(QRD K-means).此算法在层次K-means的基础上令每次K-means运算的k值变动起来,且又引入了分位数半径的概念,用样本点到类中心距离的分位数作为类的半径,将样本点间的关系简化为各个类的分位数半径与类中心的关系.通过中心点间距离与分位数半径大小的比较对中心点进行融合形成新类,从而快速给出良好的聚类结果,同时也确定了类的数目.在仿真实验中,通过与不同算法在时间和分类精确度上的比较分析,也证明该方法快速有效.
There are obvious deficiencies in the well-known K-means algorithm:one is that it is sensitive to initial values,and the other is that the number of clusters needs to be given artificially.Many related works have improved the performance of K-means,but most of them only focus on single aspect rather than both on speed and identification of the number of clusters.The Hierarchical K-means algorithm(H K-means) was proposed to generate a center set C by carrying out K-means repeatedly with a constant k.After that,it generates the initial cluster centers of K-means by executing Hierarchical clustering algorithm on the center set C.This paper focuses on the geometric features of clusters,proposing a dynamic K-means algorithm based on quantile radius(QRD K-means).This algorithm is on the basis of Hierarchical K-means and repeats K-means with varying k values.After executing K-means multiple times,a series of cluster centers are obtained to be stored in set C.Moreover,QRD K-means additionally a new concept called quantile radius,which is a certain quantile of distance between the center and the sample points which belong to the corresponding cluster.Then it merges the centers in set C into a new cluster by comparing the pair-wise distance of centers with the sum of the corresponding quantile radiuses.In this way,a clustering result of centers can be obtained adaptively.At last,QRD K-means uses the centers of these new clusters as the initial centers of K-means to get the finial clustering result.The change in k value makes the center set C cover more areas than the previous Hierarchical K-means and can lead to more representative initial cluster centers.The way to get initial cluster centers by means of quantile radius is also faster.In the simulation experiments,QRD K-means was compared with Hierarchical K-means,DP-means and X-means both on the time consumption and accuracy.Results show that QRD K-means method is faster and more efficient.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第1期48-55,共8页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(11471264
11401148
11571282
51437003)