摘要
层次凝聚算法是一个非常有用的聚类算法,它在迭代地凝聚每次接近对直到所有的数据都属于同一个簇。但层次聚类也存在着几个缺点,如聚类时的时空复杂性高;聚类的簇效率低、误差较大等。经验研究表明,大部分HAC算法都有这样一个趋势:除了在谱系图的顶层,所有低层聚类的簇都是比较小的并且很接近于其他的簇,提出了一种改进算法能够减小时空复杂性并能验证其正确性,分析与实验都证明这种方法是非常有效的。
A prominent and useful class of algorithm is hierarchical agglomerative clustering (HAC) which iteratively agglomerates the closest pare until all data points belong to one cluster. However, HAC methods have several drawbacks, such as high time and memory complexities when clustering, insufficient and inaccurate cluster validation, etc. Empirical study shows that most HAC algorithms follow a trend where, except for a number of top levels of the dendrogram, all lower level agglomerate clusters are very small in size and close in proximity to other clusters. Methods are proposed to reduce the time and memory complexities significantly and to make validation very efficient and accurate. Analysis and experiments all prove the effectiveness of the proposed method.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第6期243-244,268,共3页
Computer Applications and Software