摘要
传统的密度聚类算法不能识别并聚类多个不同密度的簇。对此提出了变密度聚类算法VDBSCAN,针对密度不稳定的数据集,可有效识别并同时聚类不同密度的簇,避免合并和遗漏。VDBSCAN算法的基本思想是:根据k-dist图和DK分析,对数据集中的不同密度层次自动选择一组Eps值,分别调用DBSCAN算法。不同的Eps值,能够找到不同密度的簇。4个二维数据集实验验证了VDB-SCAN算法的有效性,表明VDBSCAN算法可以有效地聚类密度不均匀的数据集,且参数Eps的自动选择方法也是有效的和健壮的。
Density clustering has been widely used with such advantages as:its clusters are easy to understand and it does not limit itself to shapes of clusters.But existing density-based algorithms have trouble in finding out all the meaningful clusters for datasets with varied densities.This paper introduces a new algorithm called VDBSCAN for the purpose of varied-density datasets analysis.The basic idea of VDBSCAN is that,before adopting traditional DBSCAN algorithm,k-dist plot and DK (Difference between k-dists of neighboring points) analysis are used to select several values of parameter Eps for different densities.With different values of Eps,it is possible to find out clusters with varied densities simultaneity.Finally,4 synthetic 2-dimension databases are used for demonstration,and experiments show that VDBSCAN is efficient in successfully clustering uneven datasets.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第11期137-141,153,共6页
Computer Engineering and Applications
关键词
变密度聚类算法
基于密度的聚类
DBSCAN
数据挖掘
Varied Density Based Clustering Algorithm(VDBSCAN)
density-based clustering
Density Based Spatial Clustering of Application with Nose( DBSCAN )
data mining