摘要
将快速K中心点聚类算法确定初始中心点的思想应用于全局K-均值聚类算法,对其选取下一个簇的最佳初始中心的方法进行改进,提出选取下一个簇的最佳初始中心的一种新方法.该新方法选择一个周围样本分布相对密集,且距离现有簇的中心比较远的样本为下一个簇的最佳初始中心,得到一种改进的全局K-均值聚类算法.改进后的算法不仅可以避免将噪音点作为下一个簇的最佳初始中心点,而且在不影响聚类效果的基础上缩短了聚类时间.通过UCI机器学习数据库数据以及随机生成的人工模拟数据实验测试,证明改进的全局K-均值聚类算法与全局K-均值聚类算法及快速全局K-均值聚类算法相比在聚类时间上更优越.
An improved global K-means clustering algorithm is proposed by presenting a novel method of generating the next optimal initial center with the enlightening of the idea of K-medoids clustering algorithm suggested by Park et al.Our new method choose a point which has a high density and is far away from the centers of the available clusters,so that it can not only avoid choosing a noisy datum as the optimal candidate centre,but also reduce the computational time without affecting the performance of the global K-means clustering algorithm.Our improved global K-means clustering algorithm is tested on some well-known data sets from UCI and on some synthetic data with noisy data,and the results of these experiments demonstrate that our method significantly outperforms the global K-means clustering algorithm and the fast global K-means clustering algorithm.
出处
《陕西师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2010年第2期18-22,共5页
Journal of Shaanxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(30670250)