摘要
K-means算法由于其固有的初始聚类质心敏感性,存在聚类结果不稳定、容易收敛到局部最优等问题。现有改进方案在处理无噪数据集时能够在降低迭代次数的同时得到近似全局最优解,但在处理有噪数据集时容易陷入局部最优,甚至聚类效果低于传统的K-means算法。在最远空间距离确定初始质心算法的基础上,提出一种基于空间距离差的初始质心选择算法。该算法的核心思想是通过计算非聚类质心点到已选质心的距离和,并排序,选取相邻距离差最大的两点中靠近已知质心的点作为下一个簇的初始质心而实现的。实验结果表明,所提算法在聚类迭代次数相当的情况下,对不含噪声数据集的聚类准确度增加约1%,对于含有噪声的数据集,聚类准确度达到90%以上。
Due to the inherent initial clustering center sensitivity of K-means algorithm,it exists problems including result instability and being easy to fall into local optimum.The current improvement schemes can reduce the number of iteration and obtain an approximate global optimal solution when deal with noise-free data sets.But for noisy data sets,it would be easy to fall into local optimum,and the clustering result is lower than traditional K-means algorithm.Based on the algorithm that can find initial clustering centers according to the farthest spatial distance,the paper proposed a novel algorithm to select initial centers based on spatial distance difference.The main idea of the algorithm is calculating the sum distances between non-clustering center and all selected centers,then sort them.Choose the point which is the closer to the given centers as the new selected cluster center.Experimental results show that under the quite condition of iteration,when deal with noise-free data sets,the clustering accuracy of the proposed algorithm is improved about 1%.For noisy data sets,the classified accuracy is above 90%.
出处
《计算机科学》
CSCD
北大核心
2014年第S1期406-408,420,共4页
Computer Science
基金
重庆市交通委员会科学计划项目:基于RFID的车辆非法营运监控与特征提取资助