摘要
聚类算法是数据挖掘领域中一个非常重要的研究方向.至今为止人们已经提出了许多适用于大规模的、高维的数据库的聚类算法.基于密度的聚类算法是其中一个比较典型的研究方向,文中以DBSCAN为基础,提出一种基于密度的网格动态聚类算法.新算法将网格的原理运用到基于密度的聚类算法中,并采用了动态的参数法,能自动根据数据的分布情况进行必要的参数更改,有效减少DBSCAN对初始参数的敏感度,从而提高了聚类的效率和效果,降低了算法I/O的开销.算法不仅能挖掘出各种形状的聚类,并能准确的挖掘出数据集中突出的聚类.
Clustering algorithm is an important research direction in data - mining field. So far people have presented many clustering algorithms which applied to large - scale or high - dimension databases. Clustering algorithm based on density is one of the typical research directions. Based on DBSCAN, this paper presents GDCABD (a grid dynamic clustering algorithm based on density). The new algorithm puts the theory of grid into clustering algorithm which based on density. It also adopts dynamic parameter method , thus can automatically do necessary parameter modify according to data distribution, meanwhile reduce the sensitivity of DBSCAN to original parameters. As a result, it improves the efficiency and effect of clustering, at the same time reduces the cost of I/O. The algorithm can not only mine various - shape clustering, but also accurately mine prominent clustering in data sets.
出处
《安徽大学学报(自然科学版)》
CAS
北大核心
2007年第1期31-34,共4页
Journal of Anhui University(Natural Science Edition)
关键词
聚类算法
密度
网格
动态
clustering algorithm
density
grid
dynamic