期刊文献+

一类数据空间网格化聚类算法的均值近似方法(英文) 被引量:15

A Mean Approximation Approach to a Class of Grid-Based Clustering Algorithms
下载PDF
导出
摘要 随着聚类分析对象数据集规模的急剧增大,改进已有的算法以获得满意的效率受到越来越多的重视.讨论了一类采用数据空间网格划分的基于密度的聚类算法的均值近似方法.该方法过滤并释放位于稠密超方格中的数据项,并利用其重心点近似计算其对周围数据元素的影响因子.给出均值近似在聚类算法中的实现策略及其误差估计.均值近似方法在有效减少内存需求、大幅度降低计算复杂度的同时对聚类精确度影响甚微.实验结果验证了该方法能够取得令人满意的效果. In recent years, the explosively growing amount of data in numerous clustering tasks has attracted considerable interest in boosting the existing clustering algorithms to large datasets. In this paper, the mean approximation approach is discussed to improve a spectrum of partition-oriented density-based algorithms. This approach filters out the data objects in the crowded grids and approximates their influence to the rest by their gravity centers. Strategies on implementation issues as well as the error bound of the mean approximation are presented. Mean approximation leads to less memory usage and simplifies computational complexity with minor lose of the clustering accuracy. Results of exhaustive experiments reveal the promising performance of this approach.
出处 《软件学报》 EI CSCD 北大核心 2003年第7期1267-1274,共8页 Journal of Software
基金 国家自然科学基金 江苏省教育厅自然科学基金~~
关键词 聚类 网格 基于密度的 均值近似 误差估计 clustering grid density-based mean approximation error evaluation
  • 相关文献

参考文献11

  • 1Xu X, Ester M, Kriegel H, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th International Conference on Data Engineering, ICDE'98. Orlando, FL, 1998. 324~331.
  • 2Silverman B. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986.72~113.
  • 3Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.335~398.
  • 4Berchtold S, Keim D, Kriegel HP. The X-tree: An index structure for high-dimensional data. In: Proceedings of the International Conference on Very Large Databases. Bombay, India, 1996.28~39.
  • 5Hinneburg A, Keim DA. Optimal gird-clustering: Towards breaking the curse of dimensionality in high-dimensional clustering. In:Proceedings of the 25th International Conference on Very Large Databases. Edinburgh, Scotland, 1999. 506~517.
  • 6Sheikholeslami G, Chatterjee S, Zhang A. Wave-Cluster: A multi-resolution clustering approach for very large spatial databases. In:Proceedings of the 24th International Conference on Very Large Databases. New York, 1998. 428~439.
  • 7Aggrawal R, Gehrke J, Gunopulos D, Raghawan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, WA, 1998.94~ 105.
  • 8Wang W, Yang J, Muntz R. STING: A statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Databases. Athens, Greece, 1997.186~ 195.
  • 9Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'98). New York, 1998.58~65.
  • 10Xing EP, Karp RM. CLIFF: Clustering of high dimensional microarray data via iterative feature filtering using normalized cuts.BIOINFORMATICS, 2001,1(1):1~9.

同被引文献59

引证文献15

二级引证文献90

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部