期刊文献+

K-Means聚类算法中确定k值的改进方法

Improvement Methods for Determining the Value of k in the K-Means Clustering Algorithm
下载PDF
导出
摘要 针对传统k-means聚类算法过于依赖聚类数k的问题,本文提出了确定最佳聚类数k的一种新方法——双均值法。该算法不依赖于预先设定的k值,而是通过计算簇内平均距离与簇间平均距离的比值来动态确定最优的k值。该方法的创新之处在于,它结合了簇内的紧密度和簇间的分离度,从而更加精确地反映了数据的真实结构。通过在多个公共数据集上求得的k值与数据的真实类别数比较,或手肘法求得的k值相比较,说明新方法有效。 The issue of traditional k-means clustering algorithm relying too heavily on the number of clusters, k. A new method for determining the optimal number of clusters, k, has been proposed—the double mean method. This algorithm does not rely on a pre-defined k value, but rather calculates the ratio of intra-cluster average distance and inter-cluster average distance to dynamically determine the optimal k value. The innovation of this method lies in the fact that it combines intra-cluster density and inter-cluster separation, thus more accurately reflecting the true structure of the data. By comparing the k value obtained on multiple public datasets with the true number of classes in the data or with the k value obtained using the elbow method, the effectiveness of the new method is demonstrated.
出处 《数据挖掘》 2024年第3期143-148,共6页 Hans Journal of Data Mining
  • 相关文献

参考文献9

二级参考文献84

  • 1苏守宝,刘仁金.基于佳点集遗传算法的聚类技术[J].计算机应用,2005,25(3):643-645. 被引量:7
  • 2孙晓鹏,李华.基于CSR存储的三维网格最短路径算法[J].计算机工程与应用,2005,41(10):5-7. 被引量:4
  • 3孙晓鹏,李华.三维网格模型的分割及应用技术综述[J].计算机辅助设计与图形学学报,2005,17(8):1647-1655. 被引量:49
  • 4杨善林,李永森,胡笑旋,潘若愚.K-MEANS算法中的K值优化问题研究[J].系统工程理论与实践,2006,26(2):97-101. 被引量:190
  • 5钱线,黄萱菁,吴立德.初始化K-means的谱方法[J].自动化学报,2007,33(4):342-346. 被引量:32
  • 6MacQueen J.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
  • 7Wang Wei.Yang Jiong,Muntz R.STING:a statistical information grid approach to spatial data mining[C]//Proc of the 23rd International Conference on Very Large Data Bases,1997.
  • 8Pakhiraa M K,Bandyopadhyayb S I,JjwalMaulikc U.Validity index for crisp and fuzzy clusters[J].Pattern Rccognition,2004,37:487-501.
  • 9Agrawal R,Gehrke J,Gunopulcs D.Automatic subspaee clustering of high dimensional data for data mining application[C]//Proc of ACM SIGMOD Intconfon Management on Data,Seattle,WA,1998:94-205.
  • 10Bandyopadhyay S I,JjwalMaulik U.An evolutionary technique based on K-means algorithm for optimal clustering in RN[J].Information Sciences, 2002,146 : 221-237.

共引文献402

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部