期刊文献+

面向大规模数据的分层近邻传播聚类算法 被引量:14

Hierarchical Affinity Propagation Clustering for Large-scale Data Set
下载PDF
导出
摘要 近邻传播(Affinity Propagation,AP)聚类具有不需要设定聚类个数、快速准确的优点,但无法适应于大规模数据的应用需求。针对此问题,提出了分层近邻传播聚类算法。首先,将待聚类数据集划分为若干适合AP算法高效执行的子集,分别推举出各个子集的聚类中心;然后对所有子集聚类中心再次执行AP聚类,推举出整个数据集的全局聚类中心;最后根据与这些全局聚类中心的相似度对聚类样本进行划分,从而实现对大规模数据的高效聚类。在真实和模拟数据集上的实验结果均表明,与AP聚类和自适应AP聚类相比,该方法在保证较好聚类效果的同时,极大地降低了聚类的时间消耗。 Affinity Propagation (AP) has advantages on efficiency and accuracy,and has no need to set the number of clusters,but is not suitable for large-scale data clustering.Hierarchical Affinity Propagation (HAP) was proposed to overcome this problem.Firstly,the data set was divided into several subsets that can be effectively clustered by AP to select the exemplars of each subset.Then,AP clustering was implemented again on all the subset exemplars to select exemplars of the whole data set.Finally,all the data points were clustered according to similarities with the exemplars,and realizing efficient clustering of large-scale data set.The experimental results on real and simulated data sets show that,compared with traditional AP and adaptive AP,HAP reduces the time consumption greatly and achieves a good clustering result in the meanwhile.
出处 《计算机科学》 CSCD 北大核心 2014年第3期185-188,192,共5页 Computer Science
基金 信息保障技术重点实验室开放基金(KJ-12-04)资助
关键词 数据聚类 近邻传播 分层推举 聚类中心 Data clustering Affinity propagation Hierarchical selecting Clustering center
  • 相关文献

参考文献16

  • 1Frey B J,Dueck D.Clustering by Passing Messages Between Data Points[J].Science,2007,315 (5814):972-976.
  • 2王开军,张军英,李丹,张新娜,郭涛.自适应仿射传播聚类[J].自动化学报,2007,33(12):1242-1246. 被引量:144
  • 3Wang C,Lai J,Suen C,et al.Multi-Exemplar Affinity Propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35 (9):2223-2237.
  • 4Sakellariou A,Sanoudou D,Spyrou G.Combining multiple hypothesis testing and affinity propagation clustering leads to accurate,robust and sample size independent classification on gene expression data[J].BMC bioinformatics,2012,13(1):270.
  • 5Wang L,Zhang L.Color Image Segmentation Algorithm Based on Affinity Propagation Clustering[J].Foundations of Intelligent Systems.Springer Berlin Heidelberg,2012,122:731-739.
  • 6王开军,李健,张军英,涂重阳.半监督的仿射传播聚类[J].计算机工程,2007,33(23):197-198. 被引量:29
  • 7He Yan-cheng,Chen Qing-cai,Xiao-long,et al.An Adaptive Affinity Propagation Document Clustering[C] //Proceedings of the 7th International Conference on Informatics and Systems.Shenzhen,China,2010:1-7.
  • 8Zhong Y,Zheng M,Wu J,et al.Search the Optimal Preference of Affinity Propagation Algorithm[C] //2012 Fifth International Conference on Intelligent Computation Technology and Automation (ICICTA).IEEE,2012:304-307.
  • 9Shang F,Jiao L C,Shi J,et al.Fast affinity propagation clustering:A multilevel approach[J].Pattern recognition,2012,45(1):474-486.
  • 10张震,汪斌强,伊鹏,兰巨龙.一种分层组合的半监督近邻传播聚类算法[J].电子与信息学报,2013,35(3):645-651. 被引量:15

二级参考文献28

  • 1Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976
  • 2Kelly K. Affinity program slashes computing times [Online], available: http://www.news.utoronto.ca/bin6/070215-2952. asp. October 25, 2007
  • 3Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 2002, 3(7): 1-21
  • 4Wang K J. Supplement of adaptive affinity propagation clustering [Online], available: http://www.mathworks. com/matlabcentral/fileexchange/loadAut hor .do?object Type =author&objectId=1095267, October 25, 2007
  • 5Velamuru P K, Renaut R A, Guo H B, Chen K W. Robust clustering of positron emission tomography data. In: Joint Interface CSNA. USA: 2005
  • 6Dembele D, Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics, 2003, 19(8): 973-980
  • 7Strehl A. Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining [Ph. D. dissertation], The University of Texas at Austin, 2002
  • 8Blake C L, Merz C J. UCI repository of machine learning databases (University of California) [Online], available:http://mlearn.ics.uci.edu/MLRepository.html, September 27, 2007
  • 9Ben H A, Guyon I, Elisseeff A. A stability based method for discovering structure in clustered data. In: Proceedings of the 7th Pacific Symposium on Biocomputing. Hawaii, USA: 2002. 6-17
  • 10Ross D T, Scherf U, Eisen M B, Perou C M, Rees C, Spellman P. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 2000, 24(3): 227-235

共引文献165

同被引文献121

引证文献14

二级引证文献239

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部