期刊文献+

Clustering in Very Large Databases Based on Distance and Density 被引量:14

原文传递
导出
摘要 Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g., R*-tree); 2) pre-specifying the uncertain parameter k, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies,can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore,this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第1期67-76,共10页 计算机科学技术学报(英文版)
基金 国家重点基础研究发展计划(973计划),高等学校博士学科点专项科研项目,Microsoft Research Fellowship
  • 相关文献

参考文献12

  • 1Sheikholeslami Get al. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In Proc. 24th Int. Conf. Very Large Data Bases, Gupta A, Shmueli O, Widom J (eds.), New York City, Morgan Kaufmann, 1998, pp.428-438.
  • 2Zhang T, Ramakrishnan R, Livny M. BIRCH: An efficient data clustering method for very large databases.In Proc. 1996 ACM SIGMOD International Conference on Management of Data, Jagadish H V, Mumick I S (eds.), Quebec: ACM Press, 1996, pp.103-114.
  • 3Guha S et al. CURE: An efficient clustering algorithm for large databases. In Proc. 1998 ACM SIGMOD Int. Conf. Management of Data, Haas L M, Tiwary A (eds.), Seattle: ACM Press, 1998, pp.73-84.
  • 4Kaufman L et al. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 1990.
  • 5Ng R T, Han J. Efficient and effective clustering methods for spatial data mining. In Proc. the 20th Int. Conf.Very Large Data Bases, ( VLDB'94), Bocca J B, Jarke M, Zaniolo C (eds.), Santiago de Chile, Chile: Morgan Kaufmann, 1994, pp.144-155.
  • 6Jain Anil K. Algorithms for Clustering Data. Prentice Hall, 1988.
  • 7Ester Met al. A density-based algorithm for discovering clusters in large spatial databases with noises. In Proc.the 2nd International Conference on Knowled9e Discovery and Data Minin9 (KDD-96), Simoudis E, Han J, Fayyad U M (eds.), AAAI Press, 1996, pp.226-231.
  • 8Ankerst Met al. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM SIGMOD International Conference on Management of Data, Delis A, Faloutsos C, Ghandeharizadeh S (eds.),Philadelphia: ACM Press, 1999, pp.49-60.
  • 9Agrawal R, Gehrke J, Gunopulos D et al. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM SIGMOD Int. Conf. Management of Data, Haas L M, Tiwary A (eds,), Seattle: ACM Press, 1998, pp.94-105.
  • 10Wang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. In Proc. 23rd International Conference on Very Large Data Bases, Jarke M, Carey M J, Dittrich K R, Lochovsky F H, Loucopoulos P, Jeusfeld M A (eds.),Athens, Greece: Morgan Kaufmann, 1997, pp.186-195.

同被引文献114

引证文献14

二级引证文献181

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部