期刊文献+

基于代表点与K近邻的密度峰值聚类算法

Density Peaks Clustering Algorithm Based on Representative Points and K-nearest Neighbors
下载PDF
导出
摘要 密度峰值聚类(density peaks clustering,DPC)是一种基于密度的聚类算法,该算法可以直观地确定类簇数量,识别任意形状的类簇,并且自动检测、排除异常点.然而,DPC仍存在些许不足:一方面,DPC算法仅考虑全局分布,在类簇密度差距较大的数据集聚类效果较差;另一方面,DPC中点的分配策略容易导致“多米诺效应”.为此,基于代表点(representative points)与K近邻(K-nearest neighbors,KNN)提出了RKNN-DPC算法.首先,构造了K近邻密度,再引入代表点刻画样本的全局分布,提出了新的局部密度;然后,利用样本的K近邻信息,提出一种加权的K近邻分配策略以缓解“多米诺效应”;最后,在人工数据集和真实数据集上与5种聚类算法进行了对比实验,实验结果表明,所提出的RKNN-DPC可以更准确地识别类簇中心并且获得更好的聚类结果. Density peaks clustering(DPC)is a density-based clustering algorithm that can intuitively determine the number of clusters,identify clusters of any shape,and automatically detect and exclude abnormal points.However,DPC still has some shortcomings:The DPC algorithm only considers the global distribution,and the clustering performance is poor for datasets with large cluster density differences.In addition,the point allocation strategy of DPC is likely to cause a Domino effect.Hence,this study proposes a DPC algorithm based on representative points and K-nearest neighbors(KNN),namely,RKNN-DPC.First,the KNN density is constructed,and the representative points are introduced to describe the global distribution of samples and propose a new local density.Then,the KNN information of samples is used to propose a weighted KNN allocation strategy to relieve the Domino effect.Finally,a comparative experiment is conducted with five clustering algorithms on artificial datasets and real datasets.The experimental results show that the RKNN-DPC algorithm can more accurately identify cluster centers and obtain better clustering results.
作者 张清华 周靖鹏 代永杨 王国胤 ZHANG Qing-Hua;ZHOU Jing-Peng;DAI Yong-Yang;WANG Guo-Yin(Key Laboratory of Tourism Multisource Data Perception and Decision,Ministry of Culture and Tourism(Chongqing University of Posts and Telecommunications),Chongqing 400065,China;Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications),Chongqing 400065,China)
出处 《软件学报》 EI CSCD 北大核心 2023年第12期5629-5648,共20页 Journal of Software
基金 国家重点研发计划(2020YFC2003502) 国家自然科学基金(61876201) 重庆市自然科学基金(cstc2019jcyj-cxttX0002,cstc2021ycjh-bgzxm0013) 重庆市教委重点合作项目(HZ2021008)。
关键词 聚类分析 密度峰值聚类 代表点 K近邻(KNN) cluster analysis density peaks clustering(DPC) representative point K-nearest neighbors(KNN)
  • 相关文献

参考文献4

二级参考文献14

共引文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部