摘要
针对初始点选择不当导致K-means陷入局部最小值问题,提出一种结合自适应mean-shift与最小生成树(MST)的K-means聚类算法。将数据对象投影到主成分分析(PCA)子空间,给出自适应mean-shift算法,并在PCA子空间内将数据向密度大的区域聚集,再利用MST与图连通分量算法,找出数据的类别数和类标签,据此计算原始空间的密度峰值,并将其作为K-means聚类的初始中心点。对K-means的目标函数、聚类精度和运行时间进行比较,结果表明,该算法在较短的运行时间内能给出较优的全局解。
Given an inappropriate set of initial clustering centroids, K-means algorithm can get trapped in a local minimum. To remedy this, this paper proposes a K-means clustering algorithm combined with adaptive mean-shift and Minimum Spanning Tree(MST). The original data set is projected into Principal Component Analysis(PCA) subspace. An adaptive Mean-shift is proposed and run in the PCA subspace to let the data move to dense regions, and via the MST and graph connected component algorithm, it finds the number of clusters and the cluster indicators. According to the indicators, the density peaks are computed in the full space and taken as the initial centroids for K-means clustering. Experimental results show that the proposed algorithm can provide better global solution and higher clustering accuracy within a shorter period of execution time.
出处
《计算机工程》
CAS
CSCD
2013年第12期204-210,共7页
Computer Engineering
基金
国家自然科学基金资助项目(61073116
61211130309)