摘要
传统基于划分的聚类算法需要人工给定聚类数,且由于算法采取刚性划分,可能会导致将较大或延伸状的聚类簇分割的现象,导致错误的聚类结果。密度峰聚类是近年提出的一种新的基于密度的聚类算法,该算法不需要预先指定聚类数目,且能够发现非球形簇。将密度峰思想引入基于划分的聚类算法,提出一种基于密度峰和划分的快速聚类算法(DDBSCAN),该算法首先获取一组簇的核心对象(密度峰),用于描述簇的"骨骼",而后将周围的点划分到最近的核心对象,最后通过判断划分边界处的密度情况合并簇。实验证明,该算法能有效地适应任意形状、大小不一的数据集,与传统基于密度的聚类算法相比收敛速度更快。
The clustering algorithm based on traditional partition needs to give the number of clustering artificially,and due to the rigid partition of the algorithm,it may lead to the segmentation of large or extended clusters,leading to the wrong clustering results. Clustering by density peak is a new clustering algorithm based on density proposed in recent years. The algorithm does not need to specify the number of clusters in advance,and can detect nonspherical clusters. A fast clustering algorithm based on density peak and partition(DDBSCAN) is proposed in this paper. The algorithm first obtains the cluster center(density peak) of a group of clusters,which describes the"skeleton"of the cluster,then divides the surrounding points into the nearest core object,and finally the clusters is merged by judging the density at the dividing edge. Experiments show that the algorithm can effectively adapt to data sets of arbitrary shape and size,and converges faster than traditional clustering algorithms based on density.
作者
琚书存
程文杰
徐建鹏
徐祥
徐阳
JU Shu-cun;CHENG Wen-jie;XU Jian-peng;XU Xiang;XU Yang(Rural Comprehensive Economic Information Center of Anhui Province,Hefei 230001,China;Anhui Agrometeorological Center,Hefei 230001,China)
出处
《计算机与现代化》
2018年第8期16-20,共5页
Computer and Modernization
基金
国家科技支撑计划项目(2014BAD10B05-02)
国家星火计划项目(2014GA710001)
安徽省科技攻关项目(1804A07020124)
关键词
密度峰聚类
核心对象
基于划分
边界密度
任意形状
clustering by density peak
cluster center
partition-based
boundary density
irregular shape