摘要
提出了一种基于网格的共享近邻聚类算法(Grid-based shared NearestNeighbor algorithm,GNN)。该算法主要利用网格技术去除数据集中的部分孤立点或噪声,使用密度阈值处理技术来处理网格的密度阈值,使用中心点技术提高聚类效率。GNN算法仅对数据集进行一遍扫描,且能处理任意形状和大小的聚类。实验表明,GNN有较好的可扩展性,其精度和效率明显地好于共享近邻SNN算法。
A grid-based shared nearest neighbor clustering algorithm(GNN) was presented. The GNN removed some outliers or noises in the dataset by grid technique and disposed of density threshold of grid by density threshold method. The GNN clustered by the method of shared nearest neighbor and improved the efficiency by the use of the grid center, Scanning the dataset only once, the GNN can discover clusters of arbitrary shapes. The experiment results show that it can discover outliers or noises effectively and get good cluster quality.
出处
《计算机应用》
CSCD
北大核心
2006年第7期1673-1675,共3页
journal of Computer Applications
基金
河南省自然科学基金资助项目(021105110)
关键词
基于网格
共享近邻
中心点
grid-based
shared nearest neighbor
center