摘要
依据信息论的思想,对基于层次的K-均值聚类算法(HKMA)过程进行了分析,该算法首先采用层次方法对文档进行初始聚类,得到的聚类总数作为k均值算法中的k值,在此基础上,通过k均值聚类对聚类结果进行修正。实验结果表明,HKMA执行时间整体上优于k-means算法,而且随着数据量的增大执行时间的增长幅度也较小。
Probabilistic hierarchical clustering based on document information quantity.From an information theory angle,we study a K-means clustering algorithm based on hierarchy in this paper.Firstly,this algorithm classifies documents into one or more predefined categories using hierarchical methods,the total classified number is taken for the number of clusters.Secondly,it uses k-means to modify the clustering results.Experimental results showed that these algorithms have higher mining efficiency in execution time,memory usage and CPU utilization than most current ones like k-means.
出处
《微计算机信息》
2010年第12期228-229,232,共3页
Control & Automation