摘要
针对传统K-means算法对初始点敏感的问题,采用数论中的佳点集理论结合Leader方法对K-means聚类算法加以改进,启发式地生成样本初始中心。根据两者不同的结合方式,所提算法分别称为KLG和KGL。佳点集理论能够产生比随机选取点更好的点,Leader方法则能反映数据对象本身的分布特性。结合佳点集理论和Leader方法各自的优点,能获得优化的初始中心。在UCI数据集上的实验表明,KLG算法和KGL算法所得到的结果均好于传统的和其他一些初始化的K-means算法。
Traditional K-means algorithm is sensitive to the initial start center.To solve this problem,a method was proposed to optimize the initial center points through adopting the theory of good point set and Leader method.According to the different combination ways,the new algorithms were called KLG and KGL respectively.Better points could be obtained by the theory of good point set rather than random selection.The Leader method could reflect the distribution characteristics of the data object.The experimental results conducted on the UCI database show that the KLG and KGL algorithms significantly outperform the traditional and other initialization K-means algorithms.
出处
《计算机应用》
CSCD
北大核心
2011年第5期1359-1362,1373,共5页
journal of Computer Applications
基金
国家自然科学基金资助项目(60675031)
国家973计划项目(2007BC311003)