摘要
快速搜索和发现密度峰值的聚类算法(Clustering by Fast Search and Find of Density Peaks,CFSFDP)是一种新的基于密度的聚类算法,它通过发现密度峰值来有效地识别类簇中心,具有聚类速度快、实现简单等优点。针对CFSFDP算法的准确性依赖于数据集的密度估计和截断距离(dc)的人为选择问题,提出一种基于核密度估计的KCFSFDP算法。该算法利用无参的核密度估计分析数据点的分布特征并自适应地选取dc,从而搜索和发现数据点的密度峰值,并以峰值点数据作为初始聚类中心。基于4个典型数据集的仿真结果表明,K-CFSFDP算法比CFSFDP,K-means和DBSCAN算法具有更高的准确度和更强的鲁棒性。
The CFSFDP(Clustering by Fast Search and Find of Density Peaks)is a new density-based clustering algorithm,it can identify the cluster centers effectively by finding the density peaks,and it has the advantages of fast clustering speed and simple realization.The accuracy of CFSFDP algorithm depends on the density estimation in the dataset and cut off distance(dc)of artificial selection.Therefore,an improved K-CFSFDP algorithm based on kernel density estimation was presented.The algorithm uses non parametric kernel density to analyze distribution of data points and selects the dc adaptively to search and find the peak density of data points,with the peak point data as the initial cluster centers.The simulated results on 4 typical datasets show that the K-CFSFDP algorithm has better performance in accuracy and better robustness than CFSFDP,K-means and DBSCAN algorithm.
作者
董晓君
程春玲
DONG Xiao-jun;CHENG Chun-ling(College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China)
出处
《计算机科学》
CSCD
北大核心
2018年第11期244-248,共5页
Computer Science
关键词
聚类
核密度估计
密度峰值
聚类中心
Clustering
Kernel density estimation
Density peak
Cluster center