摘要
文章介绍了Web文档聚类中普遍使用的基于分割的k-means算法,分析了k-means算法所使用的向量空间模型和基于距离的相似性度量的局限性,从而提出了一种改善向量空间模型以及相似性度量的方法。实验表明,改进后的k-means算法不仅保留了原k-means算法效率高的优点,而且具有更高的准确性。
This paper introduced the popular partitioning-based k-means algorithm for Web document clustering,and analyzed the limitations of the VSM that k-means algorithm uses and the distance-based similarity computing.An improved algorithm was presented in this paper to solve these limitations,and experiments showed it is more precise than the k-means algorithm.
出处
《微型电脑应用》
2007年第8期6-8,4,共3页
Microcomputer Applications