摘要
随机选择初始聚类中心的k-means算法易使聚类陷入局部最优解、聚类结果不稳定且受孤立点影响大等问题.针对这些问题,提出了一种优化初始聚类中心的方法及孤立点排除法.该算法首先选择距离最远的两点加入初始化中心,再根据这两点将原始簇分成两个聚簇,在这两个簇中挑选方差较大的簇按照一定的规则进行分裂直至找到k个中心,初始中心的选择过程中用到孤立点排除法.在UCI数据集及人造含一定比例的噪音数据集下,通过实验比较了改进算法与其他算法的优劣.实验表明,改进后的算法不仅受孤立点的影响小、稳定性好而且准确度也高.
k-means algorithm,when its initial center is dependent on to be chosen randomly,can incur the local optimum of the clustering and the instability of the clustering results which may be affected much by isolated points. Aiming at solving those prob- lems, a kind of k-means algorithm to be able to optimize initial cluster center and eliminate isolated points are put forward. Firstly, the farthest two points are chosen to join the cluster set. Then according to those two points, the original cluster is divided into two clusters. Find one of the clusters which has the largest variance and then split it according to certain rules until k centers are found. Isolated points elimination is also put forward in the process of choosing the initial center. The proposed k-means algorithm is tested on UCI data sets and on synthetic data sets with some proportional noises. The experimental results show that the proposed novel k- means algorithm can not only achieve a very promising and stable clustering,but also obtain a small influence of isolated points.
出处
《上海师范大学学报(自然科学版)》
2016年第5期599-603,共5页
Journal of Shanghai Normal University(Natural Sciences)