期刊文献+

一种改进的k-means文档聚类初值选择算法 被引量:23

An adapted algorithm of choosing initial values for k-means document clustering
下载PDF
导出
摘要 提出了一种改进的基于最小最大原则的k-means文档聚类初始值选择算法。该方法首先构造相似度矩阵,然后利用最小最大原则对相似度矩阵进行分析,从而选择初始聚点并自动确定聚类k值。实验结果表明利用该方法找到的k值比较接近真实值。 In this paper a novel algorithm of choosing initial values for k-means document clustering is proposed, which is based on an adapted minimum maximum principle. Firstly similarity matrix is constructed, and then an adapted minimum maximum principle is used to select both the initial seeds and the value of k. The experiment results show that the value of k found by this method is very near to the true value.
出处 《高技术通讯》 CAS CSCD 北大核心 2006年第1期11-15,共5页 Chinese High Technology Letters
基金 国家自然科学基金(60435020)重点资助项目.
关键词 文档聚类 K-MEANS 最小最大原则 相似度矩阵 document clustering, k-means, minimum maximum principle, similarity matrix
  • 相关文献

参考文献17

  • 1Hatzivassiloglou V, Klavans J L, Holcombe M L, et al.Simfinder: A flexible clustering tool for surmnarization. In: Proceedings of the NAACI, 2001 Workshop on Automatic Surrunarization, Pittsburgh, PA, 2001, 41-49 .
  • 2林鸿飞,马雅彬.基于聚类的文本过滤模型[J].大连理工大学学报,2002,42(2):249-252. 被引量:9
  • 3Jain A K,Dubes R C. Algorithms for clustering data. Englewood Cliffs NJ, USA: Prentice Hall, 1988.
  • 4Sneath P H, Sokal R R. Numerical Taxonomy. London, UK:Freeman. 1973.
  • 5King B. Step-wise clustering procedures. Journal of the Amercian Statistical Association , 1967, 69(8) :86-101.
  • 6Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. Information Systems, 2001, 26( 1 ) : 35-58.
  • 7Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. In : Proceedings of the 15th International Cotfference on Data Engineering. Sydney: IEEE Computer Society Press, 1999. 512-521.
  • 8Karypis G, Han E H, Kumar V. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8) :68-75.
  • 9Han E H, Karypis G,Kumar V, et al. Clustering based on association rule hypergraphs. In: 1997 SIG-MOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tucson, Arizona, USA, 1997. 9-13.
  • 10MacQueen J B. Some methods for classification and analysis of multivariate observations. In : Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Press, 1967. 281-297.

二级参考文献2

共引文献9

同被引文献201

引证文献23

二级引证文献253

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部