摘要
针对传统K-均值聚类算法需要事先确定聚类数,以及对初始质心的选择具有敏感性,从而容易陷入局部极值点的缺陷,定义了簇间相似度度量对传统K-均值聚类进行改进。新算法可以在事先不确定K值的情况下,根据欧氏距离选取初始质心并按照K均值算法聚类,然后过滤噪声样本并确定簇半径,计算簇间相似度并合并相似簇确定数据集的类别数并得到较优的聚类结果。通过在UCI数据集的实验结果表明,新算法能准确确定类别数并有高于传统K均值算法聚类精度。
The traditional K-means clustering algorithm has two drawbacks.One is that the number of clusters must be known in advance and the other is that the clustering result is sensitive to the selection of initial cluster centroids and this may make the algorithm converge to the local optima.An improved K-means based on the definition of a similarity measure between clusters is brought forward.Although the value of K is unknown,the new algorithm can determine the number of classes and supply a pretty good clustering result through the following steps:Select the initial center of mass,K-means clustering,filtering noising sample and calculate the similarity matrix between clusters and merge the similar clusters.The experimental results on UCI data sets show that the new method could accurately determine the number of classes and get a better clustering accuracy.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第10期2270-2272,2375,共4页
Computer Engineering and Design
关键词
半聚类
K均值算法
基本簇
簇间相似度
簇合并
clustering
K-means
basic cluster
similarity between clusters
cluster merger