摘要
随着互联网数据的指数级增长,传统的聚类算法面临许多新的问题和挑战。本文深入研究了基于Hadoop的分布式K-means聚类算法,给出了算法的设计方法和实现策略。在5个不同大小的数据集上的实验表明,与传统的K-means聚类算法相比,本文设计的算法具有较好的性能,可有效地应用于海量数据的分析和挖掘。
With the exponential growth of Internet data, the traditional clustering algorithms are confronted with many new problems and challenges. In this paper, we study the distributed K-means clustering algorithm based on Hadoop, and give the design method and implementation strategy. On 5 different data sets, experiment results show that compared with the traditional K-means clustering algorithm, the algorithm has better performance and can be effectively applied to the analysis and mining of massive data.
出处
《软件》
2018年第1期35-38,共4页
Software
基金
群众性科技创新(5229XT16000J)