摘要
针对传统k-means聚类算法过于依赖聚类数k的问题,本文提出了确定最佳聚类数k的一种新方法——双均值法。该算法不依赖于预先设定的k值,而是通过计算簇内平均距离与簇间平均距离的比值来动态确定最优的k值。该方法的创新之处在于,它结合了簇内的紧密度和簇间的分离度,从而更加精确地反映了数据的真实结构。通过在多个公共数据集上求得的k值与数据的真实类别数比较,或手肘法求得的k值相比较,说明新方法有效。
The issue of traditional k-means clustering algorithm relying too heavily on the number of clusters, k. A new method for determining the optimal number of clusters, k, has been proposed—the double mean method. This algorithm does not rely on a pre-defined k value, but rather calculates the ratio of intra-cluster average distance and inter-cluster average distance to dynamically determine the optimal k value. The innovation of this method lies in the fact that it combines intra-cluster density and inter-cluster separation, thus more accurately reflecting the true structure of the data. By comparing the k value obtained on multiple public datasets with the true number of classes in the data or with the k value obtained using the elbow method, the effectiveness of the new method is demonstrated.
出处
《数据挖掘》
2024年第3期143-148,共6页
Hans Journal of Data Mining