摘要
针对传统K-means算法随机选取初始聚类中心导致聚类结果随机性大、优劣不定的缺点,通过定义局部方差,利用方差反映数据密集程度的特性,提出一种基于最小局部方差优化初始聚类中心的K-means算法。该算法选取数据集中局部方差最小的点作为一个初始聚类中心,并利用数据信息更新数据集,直到选到k个初始聚类中心,实现初始聚类中心优化。基于UCI数据集与人工数据集进行实验,与传统K-means算法及最小方差优化初始聚类中心的K-means算法进行性能比较。实验结果表明,基于最小局部方差优化初始聚类中心的K-means算法具有良好的聚类效果和很好的鲁棒性,且聚类时间较短,验证了算法有效性和优越性。
The traditional K-means algorithm randomly selects the initial clustering center,which leads to great randomness of the clustering results.To overcome this problem,considering the characteristics of variance reflecting data intensity,we propose a K-means algorithm based on minimum local variance to optimize the initial clustering center.The method selects the point with the smallest local variance in the dataset as an initial clustering center,and updates the dataset with the data information until the K initial clustering centers are selected.The performance of the proposed algorithm and the traditional algorithm and the k-means algorithm based on minimum variance initialized clustering conters compared by UCI dataset and artificial dataset experiments.The experimental results show that the proposed algorithm has good clustering effect,short clustering time and good robustness.The effectiveness and su⁃periority of the proposed algorithm are verified.
作者
王世其
张文斌
蔡潮森
李建军
WANG Shi-qi;ZHANG Wen-bin;CAI Chao-sen;LI Jian-jun(College of Science,Nanjing University of Science and Technology,Nanjing 210094,China)
出处
《软件导刊》
2020年第6期196-200,共5页
Software Guide
基金
江苏省大学生创新创业训练计划项目(201810288003Y)。