摘要
聚类是数据挖掘领域中一个重要的研究方向,DBSCAN是一种基于密度的聚类算法。该算法将具有足够高密度的区域划分成簇,并可以在带有“噪声”的空间数据库中发现任意形状的簇。分析DBSCAN算法发现存在如下问题:当数据分布不均匀时,由于使用统一的全局变量,使得聚类的效果差。针对这一缺陷,提出了一种基于数据划分的思想,并对各个局部数据集采取不同的参数值分别进行聚类,最后合并各局部聚类结果。实验结果表明,改进后的算法有效并可行。
Clustering is one ofthe most important research fields in data mining. DBSCAN is a density based clustering algorithm, This algorithm is capable of clustering high density areas and finding arbitrary clusters in spatial database with noise. However, when DBSCAN is analyized, it is found that when data distribution is not even, clustering quality degrades for using the same global variable. In this paper, aimming at this weakness, a data partition based algorithm is proposed, For each local dataset, different variables are adopted, and clustering is done separately. At last local clustering results are merged. The experimental result demonstrates that the improved algorithm is effective and feasible.
出处
《计算机工程与设计》
CSCD
北大核心
2005年第9期2319-2321,共3页
Computer Engineering and Design
基金
重庆市科委应用基础研究基金项目(20037986)