摘要
数据聚类在数据挖掘、模式识别、图像处理和数据压缩等领域有着广泛的应用 .DBSCAN是一种基于密度的空间聚类算法 ,在处理空间数据时具有快速、有效处理噪声点和发现任意形状的聚类等优点 .但由于直接对数据库进行操作 ,在数据量大的时候就需要较多的内存和 I/O开销 ;此外 ,当数据密度和聚类间的距离不均匀时聚类质量较差 .为此 ,在分析 DBSCAN算法不足的基础上 ,提出了一个基于数据分区的 DBSCAN算法 .测试结果表明新算法不仅提高了聚类速度 ,而且改善了聚类质量 .
Clustering is a promising application technique for many fields including data mining, pattern recognition, image processing, compression and other business applications. DBSCAN is a density based clustering algorithm that can efficiently discover clusters of arbitrary shape and can effectively handle noise. However, it requires large volume of memory support and needs a lot of I/O costs when dealing with large scale databases because it operates directly on the entire databases. Furthermore, clustering quality will degrade when the cluster density and the distance between clusters are not even. In this paper, an improved DBSCAN algorithm is presented on the basis of data partitioning. Experimental results show that the new algorithm is superior to the original DBSCAN in efficiency.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2000年第10期1153-1159,共7页
Journal of Computer Research and Development
基金
国家自然科学基金!(项目编号 6 9743 0 0 1)
国家教委博士点基金
关键词
空间数据库
数据挖掘
数据分区
DBSCAN算法
spatial database, data mining, clustering, data partitioning, DBSCAN algorithm