期刊文献+

一种面向数据密集型计算环境的聚类算法 被引量:3

A Clustering Algorithm for Data-Intensive Computing Environments
下载PDF
导出
摘要 针对数据密集型计算环境下数据具有海量、分布、异构、高速变化等特点,分析传统的基于密度的分布式聚类(Density Base Distributed Clustering,DBDC)算法,借助MapReduce编程模型,提出一种新的分布式聚类算法,采用局部和全局的方式处理海量、异构数据,解决具有以上特点的数据密集型计算环境下数据的分析挖掘问题。得出算法的复杂度为O((nlog2n)/p),实验验证在数据量与节点数变化时算法具有较高的稳定性和可伸缩性,与原算法对比该算法具有较高的准确度。 DBDC(Density Base Distributed Clustering) is a distributed clustering algorithm which is based on the density clustering DBSCAN algorithm.In this paper,we research the DBDC algorithm,and propose a new IDBDC algorithm for the data-intensive computing environments based on the MapReduce model,and discuss the complexity of the algorithm.Experiments verify the feasibility and effectiveness of the algorithm.
出处 《济南大学学报(自然科学版)》 CAS 北大核心 2013年第1期11-15,共5页 Journal of University of Jinan(Science and Technology)
基金 山东省自然科学基金(ZR2011FL013)
关键词 数据密集型计算 分布式聚类 基于密度的分布式聚类算法 data-intensive computing distributed clustering DBDC algorithm
  • 相关文献

参考文献3

二级参考文献8

  • 1Easter M, Kriegek H E Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Databases[C]//Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining. [S. l.]: AAAI Press, 1996.
  • 2Beckrnann N, Kriegel H P, Schneider R, et al. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles[C]// Proc. of ACM International Conference on Management of Data. Atlantic City, USA: ACM Press, 1990.
  • 3Ankerst M, Breunig M M, Kriegel H P, et al. Ordering Points to Identify the Clustering Structure[C]//Proc. of ACM SIGMOD International Conference on Management of Data. Philadelphia, USA: ACM Press, 1999.
  • 4Brecheisen S, Kriegel H R Kroger P, et al. Visually Mining Through Cluster Hierarchies[C]//Proc. of SIAM Int'l Conf. on Data Mining. Orlando, USA: [s. n.], 2004.
  • 5Ester M, Kriegel H P, Sander J, et al. Incremental Clustering for Mining in a Datawarehousing Environment[C]//Proc. of the 24th Int'l Conf. on Very Large Databases. New York, USA: [s. n.], 1998.
  • 6Ester M, Kriegel H R Sander J, et al. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland, USA: AAAI Press, 1996.
  • 7Januzaj E, Kriegel H E Pfeifle M. Density-based Distributed Clustering[C]//Proc. of the 9th Int'l Conf. on Extending Database Technology. Heraklion, Greece: [s. n.], 2004.
  • 8周水庚,周傲英,曹晶.基于数据分区的DBSCAN算法[J].计算机研究与发展,2000,37(10):1153-1159. 被引量:97

共引文献13

同被引文献70

  • 1郑苗苗,吉根林.DK-Means——分布式聚类算法K-Dmeans的改进[J].计算机研究与发展,2007,44(z2):84-88. 被引量:9
  • 2李锁花,孙志挥,周晓云.基于特征向量的分布式聚类算法[J].计算机应用,2006,26(2):379-382. 被引量:6
  • 3陆斌杰.数据挖掘技术在医院管理中的应用[J].中国医疗器械杂志,2006,30(4):256-257. 被引量:6
  • 4韩家炜.数据挖掘:概念与技术[M].北京:机械工业出版社,2006.
  • 5ESTER M, KRIEGEL H P, SANDER J,et al. A density-based algo- rithm for discovering clusters in large spatial databases with noise [ C ]//Proc of KDD. 1996:226-231.
  • 6XU Xiao-wei ,J -GER J, KRIEGEL H P. A fast parallel clustering al- gorithm for large spatial databases[ J]. Data Mining and Knowledge Discovery,1999,3(3) :263-290.
  • 7JANUZAJ E, KRIEGEL H P, PFEIFLE M. Towards effective and effi- cient distributed clustering[ C ]//Proc of ICDM. 2003:23-33.
  • 8JANUZAJ E, KRIEGEL H P, PFEIFLE M. DBDC:density-based dis- tributed clustering [ C ]//Proc of the 9th International Conference on Extending Database Technology. 2004:88-105.
  • 9JANUZAJ E, KRIEGEL H P, PFEIFLE M. Scalable density-based dis-tributed clustering [ C ]//Proc of PKDD. 2004:231 - 244.
  • 10HAN Jia-wei, KAMBER M. Data mining: concepts and techniques [ M ]. San Francisco: Morgan Kaufmann ,2000.

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部