摘要
K均值聚类是一种常用的聚类算法,需要指定初始中心和簇数,但随意指定初始中心可能导致聚类陷入局部最优解,且实际应用中簇数未必是已知的。针对K均值聚类的不足,文中提出了一个自适应聚类算法,该算法基于数据实例之间的最大最小距离选取初始聚类中心,基于误差平方和(SSE)选择相对最稀疏的簇分裂,并根据SSE变化趋势停止簇分裂从而自动确定簇数。实验结果表明,该算法可以在不增加迭代次数的情况下得到更准确的聚类结果,验证了所提聚类算法是有效的。
The K-means clustering algorithm, one of the most common clustering algorithms, requires to specify the initial centers and the number of clusters. However, specifying the initial centers can random- ly incur the local optimum of the clustering, and the number of clusters is not known in practice. To solve these problems, this paper proposes an adaptive clustering algorithm. The algorithm can select initial cen- ters based on maximum and minimum distances between data instances, and the most sparse cluster based on the sum of squared based on the changing error (SSE) to split, and determine the number of clusters when to stop splitting trend of SSE, thus identifying the number of clusters automatically. Experimental results show that the proposed algorithm can generate more accurate clustering results without increasing the number of iterations, thus it verifies the effectiveness of the proposed clustering algorithm.
出处
《南京邮电大学学报(自然科学版)》
北大核心
2015年第2期102-107,共6页
Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition
基金
国家自然科学基金(61170322
71171117
61373065)资助项目
关键词
K均值聚类算法
最大最小距离
初始中心
误差平方和
K-means clustering algorithm
maximum and minimum distances
initial centers
sum ofsquared errors