摘要
研究表明,对于非球形簇和密度不均匀的聚类,DPC很难选择正确的簇中心;同时,DPC的分配方法存在多米诺骨牌效应,即不正确的分配一个区域中密度最高的点,将导致该区域中的所有点都指向同一个错误的聚类.为了解决这两个不足,本文提出了近邻关系约束和簇心扩散的密度峰值聚类算法(DPC-NCCD).首先,引入了k近邻和二阶k近邻来重新定义局部密度,避免了密度不均匀的数据集在选取密度峰值时候出现的错误,确保簇心选择的正确性;其次,对于剩余样本的分配,本文采用三阶段的分配策略,每个阶段中依据不同的近邻关系约束条件来逐步扩大类簇.这样的分配策略可以缓解多米诺效应,并提高在流形数据集上的正确性.通过人工数据和真实数据的测试,证明了该算法在密度不均匀的流形数据集上具有良好的聚类性能.
Research indicates that Density Peak Clustering(DPC)struggles in selecting accurate cluster centers for non-spherical clusters and clusters with non-uniform densities.Additionally,DPC′s allocation method exhibits a domino effect,whereby incorrectly assigning the highest density point in a region leads to all points in that region pointing to the same erroneous cluster.To address these limitations,we propose the Density Peak Clustering algorithm with Neighbor Constraint and Cluster Center Diffusion(DPC-NCCD).Firstly,we introduce k-nearest neighbors and second-order k-nearest neighbors to redefine local density,preventing erroneous selection of density peaks in datasets with non-uniform densities and ensuring the correctness of cluster center selection.Secondly,for the allocation of remaining samples,we employ a three-stage allocation strategy,gradually expanding clusters based on different neighbor constraint conditions.This allocation strategy alleviates the domino effect and improves accuracy on manifold datasets.Through testing on both synthetic and real datasets,we demonstrate that the proposed algorithm exhibits excellent clustering performance on manifold datasets with non-uniform densities.
作者
杨重阳
徐华
张紫丹
YANG Chongyang;XU Hua;ZHANG Zidan(College of Artifical Intelligence and Computer,Jiangnan University,Wuxi 214122,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2024年第12期2830-2837,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(62106088)资助。
关键词
聚类算法
密度峰值
K近邻
二阶K近邻
clustering algorithm
peak density
K-nearest neighbor
second order K nearest neighbor