摘要Kmeans是一种经典聚类算法,应用范围很广。但该算法有着自身的一些缺点,如不能消除离群点的影响,对初始聚类中心的选取敏感,聚类结果不稳定等。本文基于单点密度来屏蔽离群点和选取初始聚类中心,达到优化Kmeans的目的,该算法简称SDKmeans(Single density kmeans)。实验证明,SDKmeans算法能获得较好较稳定的聚类结果。
1Yedla, M., S.P,. Pathakota, and T. Srinivasa, Enhancing K-means clus- tering algorithm with improved initial center. International Journal of computer science and information technologies, 2010. 1 (2): p. 121-125.
2Celebi, M.E., H.A. Kingravi, and P.A. Vela, A comparative study of ef- ficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 2013. 40(1): p. 200-210.
3Forgy, E.W., Cluster analysis of multivariate data : efficiency versus in- terpretability of classifications. Biometrics, 1965.21: p. 768-769.
4MacQueen, J. Some methods for classification and analysis of multivari- ate observations, in Proceedings of the fifth Berkeley symposium on mathe- matical statistics and probability. 1967. California, USA.
5Ball, G.H. and D.J. Hall, A clustering technique for summarizing multi- variate data. Behavioral science, 1967.12(2): p. 153-155.
6Sp?th, H., Computational experiences with the exchange method: Ap- plied to four commonly used partitioning cluster analysis criteria. European Journal of Operational Research, 1977. 1(1): p. 23-31.
7Gonzalez, T.F., Clustering to minimize the maximum intercluster dis- tance. Theoretical Computer Science, 1985.38: p. 293-306.
8Katsavounidis, I., C.-C. Jay Kuo, and Z. Zhang, A new initialization technique for generalized Lloyd iteration. Signal Processing Letters, IEEE,1994. 1(10): p. 144-146.
9Bradley, P.S. and U,M. Fayyad. Refining Initial Points for K-Means Clustering. in ICML. 1998. Citeseer.
10Hartigan, J.A. and M.A. Wong, Algorithm AS 136: A K-Means Clus- tering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979.28(1): p. 100-108.