期刊文献+

一种高效的基于初始聚类中心优化的K-means算法 被引量:6

An Efficient K-means Algorithm Based on Optimizing Initial Cluster Centers
下载PDF
导出
摘要 为解决传统K-means算法初始质心的随机选取以及聚类过程中每个数据样本到聚类中心距离的重复计算问题,提出了一种高效的基于初始聚类中心优化的K-means算法,采用最小方差优化初始质心,通过存储每次迭代中所有数据点的簇标志和到最近聚类中心的距离并用于下一次迭代,避免了重复计算数据点到每个中心的距离。在UCI数据库中五个不同的数据集上进行了测试,对各个算法在聚类准则函数,运行时间以及迭代次数上进行实验结果比较,表明在不降低聚类性能的前提下,减少了迭代次数,缩短了聚类时间,证明了改进算法的有效性和高效性。 The traditional K-means algorithm which randomly chosen initial centers and cluster each data sample to the cluster center distance of double counting problem, in order to solve this question, this paper proposes an efficient k-means algorithm based on optimizing initial cluster centers, the algorithm uses the minimum deviation initial cluster centers, by store the labels of cluster and the distance of all the date objects to the nearest cluster during the each iteration, which is to be used in the next iteration, the improved method avoids computing the distance of each data object to the cluster centers repeatly.Tested on the UCI database of five different data sets, the various algorithms in clustering criterion function, running time and number of iterations are compared, the experimental results show that un- der the premise of without affecting the clustering results, shortening the time of clustering, prove the effectiveness and efficiency of the improved algorithm.
出处 《长春理工大学学报(自然科学版)》 2015年第4期154-158,共5页 Journal of Changchun University of Science and Technology(Natural Science Edition)
关键词 K-MEANS算法 方差 初始聚类中心 距离 时间 K-means algorithm deviation initialized clustering centers distance time
  • 相关文献

参考文献7

  • 1Hart Jiawei,Kamber M. Data mining: concepts and techniques[M].Beijing:China Machine Press,2011.
  • 2Shunye W. An improved k-means clustering algo- rithm based on dissimilarity[C]//Mechatronic Sci- ences, Electric Engineering and Computer (MEC), Proceedings 2013 International Conference on IEEE, 2013 : 2629-2633.
  • 3XU Junling,XU Baowen,ZHANG Weifeng,ZHANG Wei,HOU Jun.Stable Initialization Scheme for K-Means Clustering[J].Wuhan University Journal of Natural Sciences,2009,14(1):24-28. 被引量:15
  • 4Redmond S J,Heneghan C. A method for ini- tializing the K-means clustering algorithm using kd-trees [J].Pattern Recognition letters, 2007,28 (8) : 965-973.
  • 5谢娟英,王艳娥.最小方差优化初始聚类中心的K-means算法[J].计算机工程,2014,40(8):205-211. 被引量:86
  • 6Likas A,Vlassis M,Verbeek J. The global K-means clustering algorithm[J].Pattern Recognition,2003,36 (2) :451-461.
  • 7Na S, Xumin L, Yong G. Research on k-means clustering algorithm: An improved k-means cluste~ ing algorithm[C]//Intelligent Information Technolo- gy and Security Informatics (IITSI),2010 Third In- ternational Symposium on IEEE, 2010.63-67.

二级参考文献35

  • 1张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12. 被引量:60
  • 2钱线,黄萱菁,吴立德.初始化K-means的谱方法[J].自动化学报,2007,33(4):342-346. 被引量:32
  • 3袁方,周志勇,宋鑫.初始聚类中心优化的k-means算法[J].计算机工程,2007,33(3):65-66. 被引量:152
  • 4盛骤,谢式千,潘承毅.概率论与数理统计[M].2版.北京:高等教育出版社,1997:18-28.
  • 5Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].2nd ed.Beijing,China:China Machine Press,2011.
  • 6Pena J M,Lozano J A,Larranaga P.An Empirical Comparison of Four Initialization Methods for the K Means Algorithm[J].Pattern Recognition Letters,1999,20(10):1027-1040.
  • 7Vance F.Clustering and the Continuous K-Means Algorithm[J].Los Alamos Science,1994,22:138-134.
  • 8Jain A K,Murty M N,Flynn P J.Data Clustering:A Review[J].ACM Computing Survey,1999,31 (3):264-323.
  • 9Kaufman L,Rousseeuw P J.Finding Groups in Data:An Introduction to Cluster Analysis[M].New York,USA:John Wiley & Sons,Inc.,1990.
  • 10Dhillon I S,Guan Yuqiang,Kogan J.Refining Clusters in High Dimensional Text Data[C]//Proceedings of the 2nd SIAM Workshop on Clustering High Dimensional Data.Arlington,USA:[s.n.],2002:59-66.

共引文献99

同被引文献41

引证文献6

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部