期刊文献+

新的K-均值算法最佳聚类数确定方法 被引量:90

New method for determining optimal number of clusters in K-means clustering algorithm
下载PDF
导出
摘要 K-均值聚类算法是以确定的类数k和随机选定的初始聚类中心为前提对数据集进行聚类的。通常聚类数k事先无法确定,随机选定的初始聚类中心容易使聚类结果不稳定。提出了一种新的确定K-均值聚类算法的最佳聚类数方法,通过设定AP算法的参数,将AP算法产生的聚类数作为聚类数搜索范围的上界kmax,并通过选择合适的有效性指标Silhouette指标,以及基于最大最小距离算法思想设定初始聚类中心,分析聚类效果,确定最佳聚类数。仿真实验和分析验证了以上算法方案的可行性。 K-means clustering algorithm clusters datasets on the premise that the number of clusters is certain and initial clustering centers are selected randomly.In general the value of k cann't be confirmed beforehand,and randomly selected initial clustering centers make the result of clustering unstable.A new method for determining optimal number of clusters in K-means clustering algorithm is presented to analyze the clustering quality and determine optimal number of clusters through making the number of clusters produced by AP be the upper limit kmax of search range for the number of clusters,selecting the Silhouette validity index and setting initial clustering centers based on maximum and minimum distance algorithm.Simulation experiment and analysis demonstrate the feasibility of the above-mentioned algorithm.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第16期27-31,共5页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863)(No.2007AA1Z158) 国家自然科学基金(No.60703106)~~
关键词 K-均值聚类 聚类数 聚类有效性指标 初始聚类中心 K-means clustering number of clusters clustering validity index initial clustering centers
  • 相关文献

参考文献14

  • 1杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:187
  • 2Frey B J,Dueek D.Clustering by passing messages between data points[J].Science,2007,315:972-976.
  • 3孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1060
  • 4于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学(E辑),2002,32(2):274-280. 被引量:130
  • 5Frey B J,Dueck D.Response to comment on"clustering by passing messages between data points"[J].Science,2008,319.
  • 6Brusco M J,KShn H.Comment on"clustering by passing messages between data points"[J].Science.2008,319.
  • 7王开军,李健,张军英,涂重阳.半监督的仿射传播聚类[J].计算机工程,2007,33(23):197-198. 被引量:29
  • 8Calinski R,Harabasz J.A dendrite method for cluster analysis[J].Commun Statistics,1974,3:1-27.
  • 9Dimitriadou E,Dolnicar S,Weingessel A.An examination of indexes for determining the number of duster in binary data sets[J].Psychometrika,2002,67(1):137-160.
  • 10Kapp A V,Tibshirani R.Are clusters found in one dataset present in another dataset?[J].Biostatistics,2007,8(1):9-31.

二级参考文献13

  • 1李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:113
  • 2Treshansky A,McGraw R.An overview of clustering algorithms[A].Proceedings of SPIE,The International Society for Optical Engineering[C].2001(4367):41-51.
  • 3Clausi D A.K-means Iterative Fisher (KIF) unsupervised clustering algorithm applied to image texture segmentation[J].Pattern Recognition,2002,35:1959-1972.
  • 4Bezdek J C,Pal N R.Some new indexes of cluster validity[J].IEEE Transactions on Systems,Man,and Cybernetics _ Part B:Cybernetics,1998,28(3):301-315.
  • 5Ramze R M,Lelieveldt B P F,Reiber J H C.A new cluster validity indexes for the fuzzy c-mean[J].Pattern Recognition Letters,1998,19:237-246.
  • 6Frey B J, Dueck D. Clustering by Passing Messages Between Data Points, Science[EB/OL]. (2007-02). http://www.psi.toronto.ed u/affinitypropagation/FreyDueckScience07.pdf.
  • 7Kelly K. Affinity Program Slashes Computing Times[EB/OL]. (2007-02-15). http://www.news.utoronto.ca/bin6/070215-2952.asp.
  • 8Wang K. Supplementary Information[EB/OL]. (2007-03). http://w w w.mathwork s.cona/matlabcentral/fileexchange/loadAuthor.do?obj ect Type=author&objectld= 1095267.
  • 9Dudoit S, Fridlyand J. A Prediction-based Resampling Method for Estimating the Number of Clusters in a Dataset[EB/OL]. (2002-03). http://www.edlab.cs.um ass.edu/cs691 k/conlon/readings/Dudoit Fridlyand2002GB.pdf.
  • 10范九伦,裴继红,谢维信.基于可能性分布的聚类有效性[J].电子学报,1998,26(4):113-115. 被引量:41

共引文献1361

同被引文献785

引证文献90

二级引证文献634

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部