期刊文献+

文档聚类中k-means算法的一种改进算法 被引量:29

An Improved k-means Algorithm for Documents Clustering
下载PDF
导出
摘要 介绍了文档聚类中基于划分的k-means算法,k-means算法适合于海量文档集的处理,但它对孤立点很敏感。为此,文章提出将聚类均值点与聚类种子相分离的思想,并具体给出了基于该思想的对k-means算法的改进算法。实验表明,该改进算法比原k-means算法具有更高的准确性和稳定性。 This paper first introduces the partitioning-based k-means algorithms for documents clustering. The k-means algorithm adapts to processing the vast amount of documents, but it is sensitive to outliers. So this paper puts forward an idea to separate the clustering centroid from the clustering seed and brings forward an algorithm based on this idea to improve the k-means algorithm. The paper shows the results of the experiments to prove that this algorithm is more veracious and stable than the k-means algorighm.
出处 《计算机工程》 CAS CSCD 北大核心 2003年第2期102-103,157,共3页 Computer Engineering
关键词 文档聚类 K-MEANS算法 划分聚类算法 数据库 Document clustering k-means algorithm Partition-based clustering algorithm
  • 相关文献

参考文献6

  • 1Jain AK,Dubes RC.Algorithms for Clustering Data[]..1988
  • 2Kaufman L,Rousseeuw PJ.Finding groups in data:an introduction to cluster analysis[]..1990
  • 3Rijsbergen G J V.Information Retrieval (Second Edition)[]..1989
  • 4Kowalski G.Information Retrieval Systems - Theory and Implementation[]..1997
  • 5Fasulo D.An Analysis of Recent Work on Clustering Algorithms[]..1999
  • 6Steinbach M,Karypis G,Kumar V.A Comparison of Document Clustering Techniques[].Dept of Computer and Infor- mation Science Technical Report.1995

同被引文献200

引证文献29

二级引证文献167

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部