摘要
密度峰聚类算法是一种基于密度的新型高效聚类算法,但是存在截断距离难以确定、局部密度定义过于简单和聚类分配策略容错能力差等问题。针对上述问题,提出了一种结合共享近邻和共享逆近邻的密度峰聚类算法。首先,该算法利用样本的共享近邻和共享逆近邻构造新的相似度计算方法;然后,重新定义了局部密度计算公式,避免了截断距离的选取问题;最后,提出了新的分配策略。实验中,在人工数据集和UCI数据集上进行测试,同时与SNNDPC、DPC、FKNN-DPC、AP、OPTICS、DBSCAN和K-means算法进行比较。实验结果表明:密度峰聚类改进算法的聚类结果整体优于其他算法,同时克服了DPC算法中分配策略可能存在的链式错误分配问题。
Although density peak clustering algorithm is a new and efficient clustering algorithm based on density,it still has the problems of indeterminate cutoff distance,oversimple local density definition and poor fault tolerance of the cluster allocation strategy.Aiming at the above problems,a density peak clustering algorithm combining shared nearest neighbor and shared inverse nearest neighbors is proposed.First,shared neighbors and shared inverse neighbors of the sample are employed to construct a new similarity calculation method;then,the local density calculation formula is redefined to avoid the problem of selecting cutoff distance;finally,a new allocation strategy is proposed.In the experiment,tests are performed on artificial data sets and UCI data sets,which are compared with that of SNNDPC,DPC,AP,OPTICS,DBSCAN and K-means algorithms.The experimental results show that the clustering results of the improved density peak clustering algorithm are better than that of other algorithms as a whole,and at the same time it overcomes the possible chain misallocation problem in the allocation strategy of DPC algorithm.
作者
周欢欢
张征
张琦
ZHOU Huan-huan;ZHANG Zheng;ZHANG Qi(School of Mathematics&Information,China West Normal University,Nanchong Sichuan 637009,China)
出处
《西华师范大学学报(自然科学版)》
2022年第1期108-115,共8页
Journal of China West Normal University(Natural Sciences)
基金
西华师范大学基本科研业务费项目(19B045)。
关键词
共享近邻
共享逆近邻
密度峰聚类算法
相似度
局部密度
shared nearest neighbor
shared inverse nearest neighbor
density peak clustering algorithm
similarity
local density