摘要
为解决传统K-means算法初始质心的随机选取以及聚类过程中每个数据样本到聚类中心距离的重复计算问题,提出了一种高效的基于初始聚类中心优化的K-means算法,采用最小方差优化初始质心,通过存储每次迭代中所有数据点的簇标志和到最近聚类中心的距离并用于下一次迭代,避免了重复计算数据点到每个中心的距离。在UCI数据库中五个不同的数据集上进行了测试,对各个算法在聚类准则函数,运行时间以及迭代次数上进行实验结果比较,表明在不降低聚类性能的前提下,减少了迭代次数,缩短了聚类时间,证明了改进算法的有效性和高效性。
The traditional K-means algorithm which randomly chosen initial centers and cluster each data sample to the cluster center distance of double counting problem, in order to solve this question, this paper proposes an efficient k-means algorithm based on optimizing initial cluster centers, the algorithm uses the minimum deviation initial cluster centers, by store the labels of cluster and the distance of all the date objects to the nearest cluster during the each iteration, which is to be used in the next iteration, the improved method avoids computing the distance of each data object to the cluster centers repeatly.Tested on the UCI database of five different data sets, the various algorithms in clustering criterion function, running time and number of iterations are compared, the experimental results show that un- der the premise of without affecting the clustering results, shortening the time of clustering, prove the effectiveness and efficiency of the improved algorithm.
出处
《长春理工大学学报(自然科学版)》
2015年第4期154-158,共5页
Journal of Changchun University of Science and Technology(Natural Science Edition)