摘要
传统的集中式聚类是对集中存放在单个站点的数据集进行聚类,但不能解决数据分布存储环境下的聚类问题,而分布式聚类算法是从分布存储的数据集中提取分类模式,因此能满足此需求。针对分布式聚类算法进行综述和分析。首先对现有的分布式聚类算法进行了分类,然后对每类算法的基本思想和优缺点进行了比较,最后采用Iris和Wine两个数据集对几种分布式聚类算法从聚类精度和聚类时间两方面进行了比较。
Abstract: Traditional centralized clustering clusters a data set stored in a single site, but it cannot satisfy the clustering re- quirements when data is distributed, while distributed clustering can satisfy this need, for it extracts classification mode from distributed data. This paper surveyed and analyzed distributed clustering algorithms. Firstly, it classified existing distributed clustering algorithms. Then, it compared basic ideas, advantages and disadvantages of each class of these algorithms. Finally, it used two data sets--Iris and Wine to compare several distributed clustering algorithms with two metrics:clustering accuracy and clustering time.
出处
《计算机应用研究》
CSCD
北大核心
2013年第9期2561-2564,共4页
Application Research of Computers
基金
中央财经大学学科建设基金资助项目
关键词
集中式聚类
分布式聚类
聚类精度
聚类时间
centralized clustering
distributed clustering
clustering accuracy
clustering time