摘要
K均值聚类算法是一种常见且有效的基于划分的聚类算法。为解决该聚类算法对初始中心敏感的问题,常用的方法是层次化初始聚类中心。然而,层次初始的聚类算法仍然需要将聚类个数作为输入参数,在高维数据和海量数据中不易应用。基于能够自动确定聚类数目的目的,采用DBI度量,提出一种层次初始的聚类个数自适应的聚类方法(简称DHIKM)。通过UCI数据集和仿真数据上的实验,证明DHIKM可以在采样数据中快速找到合适的聚类个数,实验结果表明该算法在聚类质量与收敛速度上的有效性。
K-means algorithm is a common and effective clustering algorithm based on partition. To solve the problem of sensitivity of initial cluster centers, the most frequently used method is searching optimal initial cluster centers by hierarchically initializing. However, it also takes the number of clusters as the argument. It is so difficult to give the number of clusters for the high dimensional data and large volume data that the hierarchal initialization K-means cannot be directly applied. To address this problem, this paper proposes a Davies Bouldin Index(DBI) based hierarchical initialization K-means(DHIKM) algorithm through integrating DBI metric into hierarchical initialization K-means algorithm. By DBI metric, DHIKM can quickly determine the number clusters on sampled data. Experiments on UCI dataset and synthetic data demonstrate the effectiveness of the proposed algorithm.
出处
《电子设计工程》
2015年第6期5-8,共4页
Electronic Design Engineering
基金
国家中医药管理局重点学科(中医药信息学)开放课题资助(ZYYXXX-13)