期刊文献+

聚类算法综述 被引量:98

General Overview on Clustering Algorithms
下载PDF
导出
摘要 数据挖掘技术可以从大量数据中发现潜在的、有价值的知识,它给人们在信息时代所积累的海量数据赋予了新的意义。随着数据挖掘技术的迅速发展,作为其重要的组成部分,网格聚类技术已经被广泛应用于数据分析、图像处理、市场研究等许多领域。网格聚类算法研究已经成为数据挖掘研究领域中非常活跃的一个研究课题。介绍了数据挖掘理论,对网格聚类算法进行了深入的分析研究。在研究了传统网格聚类算法的基础上,提出了一些改进的网格聚类算法,这些算法相比传统网格聚类算法有更好的聚类质量和效率。在分析了传统的多密度聚类算法的基础上,提出了基于网格的多密度聚类算法(Grid-based Clustering Algorithm for Multi-density)[1],该算法主要采用密度阈值递减的多阶段聚类技术提取不同密度的聚类,同时对聚类结果进行了人工干预。研究结果表明,基于网格的多密度聚类算法不仅能够对数据集进行正确的聚类,同时还能有效地弥补孤立点检测,有效地解决了传统多密度聚类算法不能有效识别孤立点和噪声的缺陷。基于网格的多密度聚类算法比传统的共享近邻SNN算法精度高,适合于均匀密度数据集、大部分多密度数据集,并且可以发现任意形状的聚类,对噪声数据和数据输入顺序不敏感,但对小部分多密度数据集的聚类结果不理想[1]。 Data mining techniques can be used to find out potential and useful knowledge from the vast amount of data,and it plays a new significant role to the stored data in the info-times.With the rapid development of the data mining techniques,the technique of grid clustering,as important parts of data mining,are widely applied to the fields such as pattern recognition,data analysis,image processing,and market research.Research on grid clustering algorithms has become a highly active topic in the data mining research.In this thesis,the author presented the theory of data mining,and deeply analyzes the algorithms of grid clustering.Based on the analysis of traditional grid clustering algorithms,we advanced some improved grid clustering algorithms that can enhance the quality and efficiency of grid clustering compared with the traditional grid clustering algorithms.Based on the analysis of traditional algorithms for multi-density,we advanced a grid-based clustering algorithm for multi-density(GDD).The GDD is a kind of the multi-stage clustering that integrates grid-based clustering,the technique of density threshold descending and border points extraction.As shown in the research,GDD algorithm can not only clusters correctly but find outliers in the dataset,and it effectively solves the problem that traditional grid algorithms can cluster only or find outliers only.The precision of GDD algorithm is better than that of SNN.The GDD algorithm works well for even density dataset and lots of multi-density datasets;it can discover clusters of arbitrary shapes;it isn't sensitive to the input order of noises and outliers data,but it is imperfect to cluster on some multi-density datasets.
作者 伍育红
出处 《计算机科学》 CSCD 北大核心 2015年第S1期491-499 524,共10页 Computer Science
关键词 网格聚类 密度阈值递减 多阶段聚类 Grid clustering,Density threshold descending,Multi-stage clustering
  • 相关文献

参考文献6

  • 1Ng R,Han J.Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’ 94) . 1994
  • 2Zhang T,Ramakrishnan R,Livny M.BIRCH: An efficient data clustering method for very large databases[].Proceedings of ACM-SIGMOD International Conference on Management of Data.1996
  • 3Chen M S.Data mining:an overview from database perspective[].IEEE Transactions on Knowledge and Data Engineering.1996
  • 4Tan Pang-ning,Steinbach M.Introduction to Data Mining. . 2010
  • 5Chen Y,Tu L.Density-Based Clustering for Real-Time Stream Data. Proceedings of the 13th ACM SIGKDD International Conference of Knowledge Discovery and Data Mining . 2009
  • 6Spivak G.Victory in Limbo:Imagism. . 2010

共引文献4

同被引文献795

引证文献98

二级引证文献413

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部