摘要
密度峰值聚类(DPC)算法在对密度分布差异较大的数据进行聚类时效果不佳,聚类结果受局部密度及其相对距离影响,且需要手动选取聚类中心,从而降低了算法的准确性与稳定性。为此,提出一种基于加权共享近邻与累加序列的密度峰值算法DPC-WSNN。基于加权共享近邻重新定义局部密度的计算方式,以避免截断距离选取不当对聚类效果的影响,同时有效处理不同类簇数据集分布不均的问题。在原有DPC算法决策值的基础上,生成一组累加序列,将累加序列的均值作为聚类中心和非聚类中心的临界点从而实现聚类中心的自动选取。利用人工合成数据集与UCI上的真实数据集测试与评估DPC-WSNN算法,并将其与FKNN-DPC、DPC、DBSCAN等算法进行比较,结果表明,DPC-WSNN算法具有更好的聚类表现,聚类准确率较高,鲁棒性较强。
The Density Peak Clustering(DPC)algorithm exhibits poor clustering performance on data with large differences in density distribution.Its clustering results are affected by local density and its relative distance,and the clustering center must be selected manually,which reduces accuracy and stability.Therefore,in this study,we propose a density peak algorithm referred to as DPC-WSNN based on weighted shared nearest neighbor classification and an accumulated sequence.The calculation method of local density is redefined based on a weighted shared nearest neighbor algorithm to avoid the impact of improper selection of truncation distance on clustering performance and effectively address the problem of uneven distributions of different cluster datasets.Based on the decision value of the original DPC algorithm,a group of cumulative sequences are generated,and the mean value of the accumulated sequence is taken as the critical point of cluster and non-cluster centers to automatically select cluster centers.The performance of the proposed DPC-WSNN algorithm was tested and evaluated using synthetic datasets and real datasets of UCI,and compared with that of FKNN-DPC,DPC,DBSCAN,and other algorithms.The results show that the DPC-WSNN algorithm exhibited better clustering performance,high clustering accuracy,and strong robustness.
作者
王芙银
张德生
肖燕婷
WANG Fuyin;ZHANG Desheng;XIAO Yanting(School of Sciences,Xi’an University of Technology,Xi’an 710054,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第4期61-69,共9页
Computer Engineering
基金
国家自然科学基金青年科学基金项目(11801438)。
关键词
密度峰值聚类算法
局部密度
加权共享近邻
累加序列
聚类中心
Density Peak Clustering(DPC)algorithm
local density
weighted shared nearest neighbor
accumulated sequence
clustering center