摘要
密度峰值聚类是一种原理简单、运行高效的聚类算法,但其存在密度定义方式不统一、聚类中心选择容易出错和样本分配可能产生“多米诺”现象等问题。针对上述问题,提出一种基于相互K近邻的密度峰值聚类算法(MKDPC)。首先,基于样本的相互K近邻定义一种改进的局部密度,统一了DPC算法密度定义方式,能够有效避免变密度数据集聚类中心选择出错的问题;其次,基于相互K近邻定义了样本间的共享相互K近邻和相似度,进而提出一种样本多步分配策略,该策略可以有效克服样本分配过程中的“多米诺”现象。在人工数据集和真实数据集上进行实验,并将MKDPC算法与其他4种算法进行比较,验证了所提MKDPC算法的有效性。
Density peaks clustering,a kind of clustering algorithm with simple principle and high efficiency,faces several challenges,such as disunity in density definition,easy error in cluster centers selection and“domino”phenomenon in sample allocation.To solve these problems,a density peaks clustering algorithm based on mutual K-nearest neighbor(MKDPC)is proposed.Firstly,an improved density is defined based on the mutual K-nearest neighbor of samples,which unifies the density definition method of DPC algorithm,and can effectively avoid the problem of cluster centers selection error of variable density datasets.Secondly,the shared mutual K-nearest neighbor and similarity between samples are defined based on mutual Knearest neighbor,and then a multi-step sample allocation strategy is proposed,which can effectively overcome the“domino”phenomenon in the process of sample allocation.Experiments are carried out on synthetic datasets and real datasets,and the MKDPC algorithm is compared with other four alternative methods,with results substantiating its efficacy.
作者
赵志忠
陈素根
ZHAO Zhizhong;CHEN Sugen(School of Mathematics and Physics,Anqing Normal University,Anqing 246133,China)
出处
《安庆师范大学学报(自然科学版)》
2024年第2期41-46,共6页
Journal of Anqing Normal University(Natural Science Edition)
基金
国家自然科学基金项目(61702012)
安徽省自然科学基金项目(2008085MF193)
安徽省高校自然科学研究重点项目(2022AH051053)。
关键词
密度峰值聚类
相互K近邻
局部密度
分配策略
density peaks clustering
mutual K-nearest neighbor
local density
allocation strategy