摘要
密度峰值聚类(density peaks clustering,DPC)是一种基于密度的聚类算法,该算法可以直观地确定类簇数量,识别任意形状的类簇,并且自动检测、排除异常点.然而,DPC仍存在些许不足:一方面,DPC算法仅考虑全局分布,在类簇密度差距较大的数据集聚类效果较差;另一方面,DPC中点的分配策略容易导致“多米诺效应”.为此,基于代表点(representative points)与K近邻(K-nearest neighbors,KNN)提出了RKNN-DPC算法.首先,构造了K近邻密度,再引入代表点刻画样本的全局分布,提出了新的局部密度;然后,利用样本的K近邻信息,提出一种加权的K近邻分配策略以缓解“多米诺效应”;最后,在人工数据集和真实数据集上与5种聚类算法进行了对比实验,实验结果表明,所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果.
Density peaks clustering(DPC)is a density-based clustering algorithm that can intuitively determine the number of clusters,identify clusters of any shape,and automatically detect and exclude abnormal points.However,DPC still has some shortcomings:The DPC algorithm only considers the global distribution,and the clustering performance is poor for datasets with large cluster density differences.In addition,the point allocation strategy of DPC is likely to cause a Domino effect.Hence,this study proposes a DPC algorithm based on representative points and K-nearest neighbors(KNN),namely,RKNN-DPC.First,the KNN density is constructed,and the representative points are introduced to describe the global distribution of samples and propose a new local density.Then,the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect.Finally,a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets.The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
作者
张清华
周靖鹏
代永杨
王国胤
ZHANG Qing-Hua;ZHOU Jing-Peng;DAI Yong-Yang;WANG Guo-Yin(Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism(Chongqing University of Posts and Telecommunications),Chongqing 400065,China;Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)
出处
《软件学报》
EI
CSCD
北大核心
2023年第12期5629-5648,共20页
Journal of Software
基金
国家重点研发计划(2020YFC2003502)
国家自然科学基金(61876201)
重庆市自然科学基金(cstc2019jcyj-cxttX0002,cstc2021ycjh-bgzxm0013)
重庆市教委重点合作项目(HZ2021008)。
关键词
聚类分析
密度峰值聚类
代表点
K近邻(KNN)
cluster analysis
density peaks clustering(DPC)
representative point
K-nearest neighbors(KNN)