期刊文献+

针对多聚类中心大数据集的加速K-means聚类算法 被引量:28

Accelerate K-means for multi-center clustering of big datasets
下载PDF
导出
摘要 随着数据量、数据维度呈指数发展以及实际应用中聚类中心个数的增多,传统的K-means聚类算法已经不能满足实际应用中的时间和内存要求。针对该问题提出了一种基于动态类中心调整和Elkan三角判定思想的加速K-means聚类算法。实验结果证明,当数据规模达到10万条,聚类个数达到20个以上时,本算法相比Elkan算法具有更快的收敛速度和更低的内存开销。 The K-means algorithm is the most popular cluster algorithm, but for big dataset clustering with many clusters, it will take a lot of time to find all the clusters. This paper proposed a new acceleration method based on the thought of dynamical and immediate adjustment of the center K-means with triangle inequality. The triangle inequality was used to avoid redundant distance computations; But unlike Elkan' s algorithm, the centers were divided into outer-centers and inner-centers for each data point in tl^e first place, and only the tracks of the lower bounds to inner-centers were kept; On the other hand, by adjus- ting the data points cluster by cluster and updating the cluster center immediately right after finishing each cluster' s adjust- ment, the number of iteration was effectively reduced. The experiment results show that this algorithm runs much faster than Elkan' s algorithm with much less memory consumption when the cluster center number is larger than 20 and the dataset re- cords number is greater than 10 million, and the speedup becomes better when the k increases.
出处 《计算机应用研究》 CSCD 北大核心 2016年第2期413-416,共4页 Application Research of Computers
基金 国家科技支持计划资助项目(2012BAH15F05) 吉林省科技型中小企业技术创新基金资助项目(12C26212201399) 国家自然科学基金资助项目(612033161 51205389)
关键词 DIACK 加速K-means 聚类 三角定理 DIACK fast k-means clustering triangle inequality
  • 相关文献

参考文献23

  • 1MaeQueen J.Some methods for classification and analysis of multivariate observations[C]//Proc of the 5th Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297.
  • 2Chinrungrueng C,Sequin C H.Optimal adaptive K-means algorithm with dynamic adjustment of learning rate[J].IEEE Neural Networks,1995,6(1):157-169.
  • 3Darken C ,Moody J.Fast adaptive K-means clustering:some empirical results[C]//International Joint Conference on Neural Networks.1990.
  • 4Farnstrom F,Lewis J,Elkan C.Scalability for clustering algorithms revisited[J].ACM SIGKDD Explorations Newsletter,2000,2(1):51-57.
  • 5Fraing G,Sohler C.A fast K-means implementation using corsets[J].International Journal of Computational Geometry & Applications,2008,18(6):605-625.
  • 6方毅,熊盛武.一种快速的K均值聚类算法[C]//2005中国模糊逻辑与计算智能联合学术会议论文集.合肥:中国科学技术大学出版社,2005.
  • 7Pelleg D,Moore A.Accelerating exact K-means algorithms with geometric reasoning[C]// Proc of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,1999.
  • 8Kanungo T,Mount D M,Netanyahu N S,et al.An efficient K-means clustering algorithm:analysis and implementation[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2002,24(7):881-892.
  • 9Smellie A.Accelerated K-means clustering in metric spaces[J].Journal of Chemical Information and Modeling,2004,44(6):1929-1935.
  • 10Ding C.K-means clustering via principal component analysis[C]//Proc of the 21st International Conference on Machine Learning.New York:Carla Brodley,2004.

二级参考文献27

  • 1张雷,李人厚.人工免疫C-均值聚类算法[J].西安交通大学学报,2005,39(8):836-839. 被引量:17
  • 2张世勇.一种新的混合粒子群优化算法[J].重庆工商大学学报(自然科学版),2007,24(3):241-245. 被引量:6
  • 3MACQUEEN J. Some methods for classification and analysis of multivariate observations [ C]. In: Proceedings of the 5th Berkeley Symposium on Mathematics Statistic Problem, 1967. 281 -297.
  • 4SARKAR M, YEGNANARAYANA B, KHEMANI D. A clustering algorithm using an evolutionary programming - based approach [ J ]. Pattern Recognition Letters, 1997,18 (10) : 975 - 986.
  • 5KRISHNA K, MURTY M. Genetic K- means algorithm [J]. IEEE Trans on System, Man and Cybernetics: Part B, 1999, 29(3) :433 -439.
  • 6CLERC M. The swarm and the queen : towards a deterministic and adaptive particle swarm optimization [ C ]. In: Proceedings of the IEEE Congress on Evolutionary Computation, 1999. 1951 -1957.
  • 7Cheu E Y, Kwoh C K, Zhou Z. On the two-level hybrid clustering algorithm[C]//International Conference on Artificial Intelligence in Science and Technology. Berlin, Germany: Springer Verlag, 2004.
  • 8Wang H L. An unsupervised purchase-based customer clustering method for e-supply chain[C]//IEEE International Conference on Service Operations and Logistics, and Informatics: vol. 1. Piscataway, NJ, USA: IEEE, 2008: 686-688.
  • 9Chang H, Yeung D Y. Robust path-based spectral clustering[J]. Pattern Recognition, 2007, 41(1): 191-203.
  • 10Yu X P, Zhou D Y, Zhou Y. A new clustering algorithm based on distance and density[C]//International Conference on Services Systems and Services Management: vol.2. Piscataway, NJ, USA: IEEE, 2005: 1016-1021.

共引文献133

同被引文献231

引证文献28

二级引证文献163

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部