摘要
作者在前期研究工作中提出了一种基于网格的带有参考参数的聚类算法(GR-PC),该算法从用户的角度去看待聚类,最大程度地避免用户设置聚类参数的盲目性.本文对GRPC算法在高维性和可伸缩性两方面进行了扩展,将高维数据空间的聚类工作分解到二维数据空间来进行,并采用随机抽样技术来处理大规模的数据集.实验仿真表明,该算法能在三维及其以上的数据空间有效地聚类较大规模数据集.
By calculating density threshold data, some effective referential parameters were worked out and provided for users, and a new kind of clustering algorithm called GRPC was presented. With the help of these referential parameters, we could not only cluster general data but also segregate high-density clusters from lowdensity clusters. The problem of low quality of clusters of traditional grid clustering algorithm was solved when we usually ignored the distribution of data on partitioning grid. Experiment results have proved that this new algorithm can differentiate between outliers or noises and dusters effectively and discover dusters of arbitrary shapes, with good clustering quality.
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第2期48-52,共5页
Journal of Hunan University:Natural Sciences
基金
国家自然科学基金资助项目(10572048,50677069)
关键词
网格
密度阈值
聚类算法
数据挖掘
grid clustering
density threshold
clustering algorithm
data mining