期刊文献+

基于相似度的网格聚类算法 被引量:12

Similarity-based Grid Clustering Algorithm
下载PDF
导出
摘要 提出了一种基于相似度的网格聚类算法(SGCA)。该算法主要利用网格技术去除数据集中的部分孤立点或噪声,使用边界点阈值函数提取类的边界点,最后利用相似度方法进行聚类。SGCA算法只要求对数据集进行一遍扫描。实验表明,该算法可扩展性好,能处理任意形状和大小的聚类,能够很好地识别出孤立点或噪声,它不仅适用于综合数据集,而且对高维数据集也具有较好的聚类结果。还引进了网格核技术,进一步改善了SGCA算法的时间复杂度。 This paper presents a Similarity-based Grid Clustering Algorithm (SGCA).The SGCA removes some outliers or noises in the dataset by the technique of grid and disposes of border points of clusters by the method of the threshold function of border points.The SGCA clusters by the method of similarity.Scanning the dataset only once,the SGCA can discover clusters of arbitrary shapes.The experiment results show that it can discover outliers or noises effectively and get good cluster quality.The SGCA is not only suitable for some synthetic datasets,but also has better clustering results in some high dimensional datasets.In order to improve the efficiency of SGCA,the technique of grid cores-based is used in this paper.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第7期198-201,共4页 Computer Engineering and Applications
基金 河南省自然科学基金(the Natural Science Foundation of Henan Province of China under Grant No.021105110)
关键词 网格 相似度 闽值函数 grid similarity threshold function cores
  • 相关文献

参考文献10

  • 1Kaufman L,Rousseeuw P J.Finding groups in data:an introduction to cluster analysis[M].New York:John Wiley & Sons,1990.
  • 2Ankerst M,Breunig M,Kriegel H P.OPTICS:Ordering points to identify the clustering structure[C]//Proc ACM SIGMOD Int Conf on Management of Data,Philadelphia,PA,1999:49-60.
  • 3Ester M,Kriegel H P,Sander J.A density-based algorithm for discovering clusters in large spatial databases[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining,1996,8:226-231.
  • 4Wang W,Yang J,Muntz R.STING:a statistical information grid approach to spatial data mining[C]//Proceedings of the 23rd International Conference on Very Large Databases,AThens,Greece,1997:186-195.
  • 5Sheikholeslami G,Chatterjee S,Zhang A.WaveCluster:a multi-resolution clustering approach for very large spatial databases[C]//Proc1998 Int Conf Very Large Data Bases,New York,1998:428-439.
  • 6Agrawal R,Gehrke J,Gunopulos D.Automatic subspace clustering of high dimensional data for data mining applications[C]//ACM SIGMOD International Conference on Management of Data,Seattle,WA,1998:94-105.
  • 7Han Jia-wei,Kamber M.Data mining:concepts and techniques[M].New York:Morgan Kanfmann Publishers,2000.
  • 8Chen Ling,Tn Li,Chen Hong-jian.Data clustering by ant colony on a digraph[C]//Proceedings of the Fourth International Conference on Machine Learning and Cybernetics,Guangzhou,August 2005:1686-1692.
  • 9Ertoz L,Steinbach M,Kumar V.Finding clusters of different sizes,shapes,and densities in noisy,high dimensional data[C]//SIAM International Conference on Data Mining,2003:42-47.
  • 10Hsu Chih-ming,Chen Ming-syan.Subspace clustering of high dimensional spatial data with noises[M].Heidelberg.Germany:Springer,2004:31-40.

同被引文献113

引证文献12

二级引证文献62

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部