摘要
K-means算法是被广泛使用的一种聚类算法,传统的K-means算法中初始聚类中心的选择具有随机性,易使算法陷入局部最优,聚类结果不稳定。针对此问题,引入多维网格空间的思想,首先将样本集映射到一个虚拟的多维网格空间结构中,然后从中搜索出包含样本数最多且距离较远的子网格作为初始聚类中心网格,最后计算出各初始聚类中心网格中所包含样本的均值点来作为初始聚类中心。此法选择出来的初始聚类中心与实际聚类中心拟合度高,进而可据此初始聚类中心稳定高效地得到最终的聚类结果。通过使用计算机模拟数据集和UCI机器学习数据集进行测试,结果表明改进算法的迭代次数和错误率比较稳定,且均小于传统K-means算法测试结果的平均值,能有效避免陷入局部最优,并且聚类结果稳定。
K-means algorithm is a widely used clustering algorithm,but the selection of the initial clustering centers in the traditional K-means algorithm is random,which makes the algorithm easily fall into local optimum and causes instability in the clustering result.In order to solve this problem,the idea of multi-dimensional grid space was introduced to the selection of initial clustering center.Firstly,the sample set was mapped to a virtual multi-dimensional grid space structure.Secondly,the sub-grids containing the largest number of samples and being far away from each other were searched as the initial cluster center grids in the space structure.Finally,the mean points of the samples in the initial cluster center grids were calculated as the initial clustering centers.The initial clustering centers chosen by this method are very close to the actual clustering centers,so that the final clustering result can be obtained stably and efficiently.By using computer simulation data set and UCI machine learning data sets to test,both the iterative number and error rate of the improved algorithm are stable,and smaller than the average of the traditional K-means algorithm.The improved algorithm can effectively avoid falling into local optimum and guarantee the stability of clustering result.
作者
邵伦
周新志
赵成萍
张旭
SHAO Lun;ZHOU Xinzhi;ZHAO Chengping;ZHANG Xu(College of Electronics and Information Engineering,Sichuan University,Chengdu Sichuan 610065,China)
出处
《计算机应用》
CSCD
北大核心
2018年第10期2850-2855,共6页
journal of Computer Applications
基金
国家973计划项目(2013CB328903-2)~~
关键词
K-MEANS算法
聚类算法
初始聚类中心
多维网格空间
均值点
K-means algorithm
clustering algorithm
initial clustering center
multi-dimensional grid space
mean point