期刊文献+

一种改进的CURE聚类算法 被引量:4

An Improved Clustering Approach of CURE
下载PDF
导出
摘要 聚类分析是数据挖掘领域的一个重要研究方向。已经有多种用于大规模数据库的聚类算法,CURE就是一个典型的代表。本文对CURE进行了改进,新方法用多点表示一个类,但舍弃了代表点收缩的过程;通过对类内最邻近距离统计特征的分析,提出了自动分离子类的方法,因而不用预先给定聚类个数;在CURE对原始数据进行随机采样和分区聚类的基础上,增加了划分网格一步,能降低噪声影响并缩短聚类时间。对二维数据的测试表明:改进的CURE能正确识别大多数类,速度上优于原算法。 Clustering is an important tool of Data Mining. CURE is a classical hierarchical method that is designed for the mining of very large database. In this paper, CURE is improved in three aspects. We use several representatives to figure a cluster but abandon the shrinking process. After analyzing the statistical characteristics of a cluster's 1-DIST, we present a new cluster isolating criterion which can automatically determine the number of clusters. We add grid method together with the CURE's sampling and partitioning technique to deal with the original data. The grid method can not only dampen the impact of noise but also reduce the time needed for clustering. Experiments on 2-dimcnsion datasets show that the improved CURE outperformed CURE in speed and the ability of discover arbitrary clusters with shapes.
作者 郭俊 樊彦国
出处 《内蒙古石油化工》 CAS 2005年第8期12-15,共4页 Inner Mongolia Petrochemical Industry
关键词 数据挖掘 层次聚类 代表对象 聚类算法 大规模数据库 聚类分析 统计特征 自动分离 随机采样 原始数据 data mining, hierarchical clustering, representative objects, CURE
  • 相关文献

参考文献2

二级参考文献40

  • 1[1]Fasulo, D. An analysis of recent work on clustering algorithms. Technical Report, Department of Computer Science and Engineering, University of Washington, 1999. http://www.cs.washington.edu.
  • 2[2]Baraldi, A., Blonda, P. A survey of fuzzy clustering algorithms for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 1999,29:786~801.
  • 3[3]Keim, D.A., Hinneburg, A. Clustering techniques for large data sets - from the past to the future. Tutorial Notes for ACM SIGKDD 1999 International Conference on Knowledge Discovery and Data Mining. San Diego, CA, ACM, 1999. 141~181.
  • 4[4]McQueen, J. Some methods for classification and Analysis of Multivariate Observations. In: LeCam, L., Neyman, J., eds. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967. 281~297.
  • 5[5]Zhang, T., Ramakrishnan, R., Livny, M. BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. Quebec: ACM Press, 1996. 103~114.
  • 6[6]Guha, S., Rastogi, R., Shim, K. CURE: an efficient clustering algorithm for large databases. In: Haas, L.M., Tiwary, A., eds. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle: ACM Press, 1998. 73~84.
  • 7[7]Beyer, K.S., Goldstein, J., Ramakrishnan, R., et al. When is 'nearest neighbor' meaningful? In: Beeri, C., Buneman, P., eds. Proceedings of the 7th International Conference on Data Theory, ICDT'99. LNCS1540, Jerusalem, Israel: Springer, 1999. 217~235.
  • 8[8]Ester, M., Kriegel, H.-P., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noises. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 1996. 226~231.
  • 9[9]Ester, M., Kriegel, H.-P., Sander, J., et al. Incremental clustering for mining in a data warehousing environment. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York: Morgan Kaufmann, 1998. 323~333.
  • 10[10]Sander, J., Ester, M., Kriegel, H.-P., et al. Density-Based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery, 1998,2(2):169~194.

共引文献172

同被引文献22

引证文献4

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部