期刊文献+

分布式环境中聚类问题算法研究综述 被引量:13

Algorithm review of distributed clustering problem in distributed environments
下载PDF
导出
摘要 传统的集中式聚类是对集中存放在单个站点的数据集进行聚类,但不能解决数据分布存储环境下的聚类问题,而分布式聚类算法是从分布存储的数据集中提取分类模式,因此能满足此需求。针对分布式聚类算法进行综述和分析。首先对现有的分布式聚类算法进行了分类,然后对每类算法的基本思想和优缺点进行了比较,最后采用Iris和Wine两个数据集对几种分布式聚类算法从聚类精度和聚类时间两方面进行了比较。 Abstract: Traditional centralized clustering clusters a data set stored in a single site, but it cannot satisfy the clustering re- quirements when data is distributed, while distributed clustering can satisfy this need, for it extracts classification mode from distributed data. This paper surveyed and analyzed distributed clustering algorithms. Firstly, it classified existing distributed clustering algorithms. Then, it compared basic ideas, advantages and disadvantages of each class of these algorithms. Finally, it used two data sets--Iris and Wine to compare several distributed clustering algorithms with two metrics:clustering accuracy and clustering time.
出处 《计算机应用研究》 CSCD 北大核心 2013年第9期2561-2564,共4页 Application Research of Computers
基金 中央财经大学学科建设基金资助项目
关键词 集中式聚类 分布式聚类 聚类精度 聚类时间 centralized clustering distributed clustering clustering accuracy clustering time
  • 相关文献

参考文献31

  • 1ESTER M, KRIEGEL H P, SANDER J,et al. A density-based algo- rithm for discovering clusters in large spatial databases with noise [ C ]//Proc of KDD. 1996:226-231.
  • 2XU Xiao-wei ,J -GER J, KRIEGEL H P. A fast parallel clustering al- gorithm for large spatial databases[ J]. Data Mining and Knowledge Discovery,1999,3(3) :263-290.
  • 3JANUZAJ E, KRIEGEL H P, PFEIFLE M. Towards effective and effi- cient distributed clustering[ C ]//Proc of ICDM. 2003:23-33.
  • 4JANUZAJ E, KRIEGEL H P, PFEIFLE M. DBDC:density-based dis- tributed clustering [ C ]//Proc of the 9th International Conference on Extending Database Technology. 2004:88-105.
  • 5JANUZAJ E, KRIEGEL H P, PFEIFLE M. Scalable density-based dis-tributed clustering [ C ]//Proc of PKDD. 2004:231 - 244.
  • 6倪巍伟,陈耿,吴英杰,孙志挥.一种基于局部密度的分布式聚类挖掘算法[J].软件学报,2008,19(9):2339-2348. 被引量:19
  • 7钱鑫,张龙波,田爱奎,邓齐志,汪金苗.一种面向数据密集型计算环境的聚类算法[J].济南大学学报(自然科学版),2013,27(1):11-15. 被引量:3
  • 8HAN Jia-wei, KAMBER M. Data mining: concepts and techniques [ M ]. San Francisco: Morgan Kaufmann ,2000.
  • 9KARGUPTA H, CHAN P. Distributed data mining [ J ]. AI Maga- zine, 1999,20 ( 1 ) : 126-130.
  • 10DHILLON I S,MODI-IA D S. A data-clustering algorithm on distribu- ted memory multiprocessors [ C ]//Lecture Notes in Computer Sci- ence, vo11759. Berlin : Springer-Verlag ,2000:245- 260.

二级参考文献55

  • 1郑苗苗,吉根林.DK-Means——分布式聚类算法K-Dmeans的改进[J].计算机研究与发展,2007,44(z2):84-88. 被引量:9
  • 2倪巍伟,孙志挥,陆介平.k-LDCHD——高维空间k邻域局部密度聚类算法[J].计算机研究与发展,2005,42(5):784-791. 被引量:18
  • 3刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 4张国荣,印鉴.分布式环境下保持隐私的聚类挖掘算法[J].计算机工程与应用,2007,43(18):165-167. 被引量:5
  • 5HartJW,KamberM.数据挖掘概念与技术.北京:机械工业出版社,2006.
  • 6Forman G, Zhang B. Distributed data clustering can be efficient and exact. ACM SIGKDD Explorations News letter, 2000,2(2):34 - 38.
  • 7Bandyopadhyay, Gianella C, Maulik U, et al. Clustering distributed data streams in peer-to-peer environment.Information Science Journal, 2005,176(14): 1952 - 1985.
  • 8Samatova NF, Ostrouchov G, Geist A, Melechko AV. RACHET: An efficient cover-based merging of clustering hierarchies from distributed datasets. Distributed and Parallel Databases, 2002,11(2): 157 - 180.
  • 9Li M, Lee GL, Lee WC, Sivasubramaniam A. PENS: An Algorithm for Density-Based Clustering in Peer-to- Peer Systems. The f'trst International Conference on Scalable information systems New York:ACM Press, 2006.
  • 10Mandelbrot, Wheeler J. The Fractal Geometry of Nature. American Journal of Physics, 1983,51:286.

共引文献104

同被引文献124

引证文献13

二级引证文献352

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部