期刊文献+

基于隐含变量的聚类集成模型 被引量:14

A Latent Variable Model for Cluster Ensemble
下载PDF
导出
摘要 聚类集成能成为机器学习活跃的研究热点,是因为聚类集成能够保护私有信息、分布式处理数据和对知识进行重用,此外,噪声和孤立点对结果的影响较小.主要工作包括:第一,分析了把每一个基聚类器看成是原数据的一个属性这种处理方式的优越性,发现按此方法建立起来的聚类集成算法就具有良好的扩展性和灵活性;第二,在此基础之上,建立了latent variable cluster ensemble(LVCE)概率模型进行聚类集成,并且给出了LVCE模型的Markov chain Monte Carlo(MCMC)算法.实验结果表明,LVCE模型的MCMC算法能够进行聚类集成并且达到良好的效果,同时可以体现数据聚类的紧密程度. Cluster ensemble becomes a research focus due to its success in privacy protection, distributing computing and reusing knowledge. Furthermore, the noise and isolation have little effect on the final result. There are two contributions in this paper. First, by regarding every base clustering as one attribute of the original data, it has found that the algorithm based on that is more extendable and flexible. Second, it designs a latent variable cluster ensemble (LVCE) model in this way and infers the algorithm of the model with Markov chain Monte Carlo (MCMC) approximation. At the end of the paper, the experimental results show that the MCMC algorithm of LVCE has a better result and can show the compactedness of data points clustering.
出处 《软件学报》 EI CSCD 北大核心 2009年第4期825-833,共9页 Journal of Software
基金 国家留学基金委员会资助项目No.2007U24068~~
关键词 聚类集成 隐含变量 聚类集成模型 MCMC(Markov CHAIN MONTE Carlo) cluster ensemble latent variable LVCE (latent variable cluster ensemble) MCMC (Markov chain Monte Carlo)
  • 相关文献

参考文献1

二级参考文献14

  • 1Estivill-Castro V. Why so many clustering algorithms-A position paper. SIGKDD Explorations, 2002,4(1):65-75.
  • 2Dietterich TG. Machine learning research: Four current directions. AI Magazine, 1997,18(4):97-136.
  • 3Breiman L. Bagging predicators. Machine Learning, 1996,24(2):123-140.
  • 4Zhou ZH, Wu J, Tang W. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 2002,137(1-2):239-263.
  • 5Strehl A, Ghosh J. Cluster ensembles-A knowledge reuse framework for combining partitionings. In: Dechter R, Kearns M,Sutton R, eds. Proc. of the 18th National Conf. on Artificial Intelligence. Menlo Park: AAAI Press, 2002. 93-98.
  • 6MacQueen JB. Some methods for classification and analysis of multivariate observations. In: LeCam LM, Neyman J, eds. Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967,1:281-297.
  • 7Blake C, Keogh E, Merz CJ. UCI Repository of machine learning databases. Irvine: Department of Information and Computer Science, University of California, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html
  • 8Modha DS, Spangler WS. Feature weighting in k-means clustering. Machine Learning, 2003,52(3):217-237.
  • 9Zhou ZH, Tang W. Clusterer ensemble. Technical Report, Nanjing: AI Lab., Department of Computer Science & Technology,Nanjing University, 2002.
  • 10Fern XZ, Brodley CE. Random projection for high dimensional data clustering: A cluster ensemble approach. In: Fawcett T, Mishra N, eds. Proc. of the 20th Int'l Conf. on Machine Learning. Menlo Park: AAAI Press, 2003. 186-193.

共引文献94

同被引文献210

引证文献14

二级引证文献211

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部