期刊文献+

基于随机游走模型和KL-divergence的聚类算法 被引量:6

Clustering Algorithm Based on Random Walk Model and KL-divergence
下载PDF
导出
摘要 聚类分析在数据挖掘领域有着广泛的应用,该文提出一个聚类新思路,它不需要任何参数的假设,只基于数据两两之间的相似性。该方法假设数据点之间存在随机游走关系,根据数据相似性构造随机游走过程的转移矩阵,当随机游走过程进入收敛期后,t阶转移矩阵揭示了数据点的分布。用迭代方法寻找最小的KL-divergence来对这些分布聚类。该方法具有严谨的概率理论基础,避免了传统算法需要参数假设、限于局部最优等不足。实验表明,该算法具有较优的聚类效果。 Clustering analysis is broadly applied in data mining. This paper presents a new idea in clustering based on pair-wise similarities, and assumes no parametric statistical model. Similarities are transformed to a Markov random walk probability matrix. It is assumed the dataset is under a Markov random walk process. When the process is going into convergence, the t-step transform matrix indicates the distribution of the dataset. It uses iterative algorithm to cluster these data with the goal of decreasing KL-divergence. This method has a solid foundation of probability theory, which can avoid some insufficiency of the traditional algorithms. The experiment shows the algorithm can achieve better results than K-means and mixture models.
作者 何会民
出处 《计算机工程》 CAS CSCD 北大核心 2008年第16期224-226,共3页 Computer Engineering
关键词 聚类 随机游走 KL散度 clustering random walk KL-divergence
  • 相关文献

参考文献7

  • 1Jain A K, Murty M N, Flynn P J. Data Clustering: A Review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
  • 2MacQueen J B. Some Methods for Classification and Analysis of Multivariate Observations[C]//Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA: [s. n.], 1967: 281-297.
  • 3Vlassis N, Likas A. A Greedy EM Algorithm for Gaussian Mixture Learning[J]. Neural Processing Letters, 2002, 15(1): 77-87
  • 4Norris J R. Markov Chains[M]. Cambridge, UK: Cambridge University Press, 1997: 40-46.
  • 5Kullback S, Leibler R A. On Information and Sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.
  • 6Tishby N, Slonim N. Data Clustering by Markovian Relaxation and the Information Bottleneck Method[C]//Proc. of the Advances in Neural Information Processing Systems. Denver, CO, USA: [s. n.], 2000: 640-646.
  • 7Asuncion A. UCI Machine Learning Repository[EB/OL]. (2007- 05-11). http://www.ics.uci.edu/$\sim$mlearn/MLRepository.html.

同被引文献67

引证文献6

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部