摘要
密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,能够快速找到聚类中心完成聚类。该算法通过截断距离定义局部密度未考虑样本点的空间分布特征;通过决策图选择聚类中心点,具有较强人为主观性;在分配样本点时采用单一分配策略,易产生连带错误。因此提出一种自适应聚类中心策略优化的密度峰值聚类算法(ADPC),采用共享近邻定义两点之间的相似性度量,重新定义了局部密度,使局部密度反应样本间的空间分布特征;通过相邻点之间斜率差分确定样本密度ρ与相对距离δ的乘积γ值的“拐点”,并对γ进行幂函数变换,以提高潜在聚类中心与非聚类中心的区分度,利用决策函数确定潜在的聚类中心,再通过潜在聚类中心之间距离均值自适应确定真实聚类中心;优化了非聚类中心点的分配策略。通过在UCI以及人工数据集上进行实验,该算法都可以自适应准确选定聚类中心,且在一定程度上提高了聚类性能。
Density peak clustering(DPC)algorithm is a simple and efficient unsupervised clustering algorithm,which can quickly find the clustering centers to complete clustering.However,the local density is defined by truncation distance without considering the spatial distribution characteristics of sample points.Selecting clustering center points by decision graph has strong artificial subjectivity.When using single allocation strategy,it is easy to cause joint error.Therefore,a density peak clustering algorithm optimized by shared nearest neighbors and adaptive clustering centers strategy(ADPC)is proposed.The shared nearest neighbors are used to define the similarity measure between two points,and the local density is redefined so that it reflects the spatial distribution characteristics of samples.Theγvalue is the product of the sample densityρand relative distanceδ.The“inflection point”is determined by slope difference between adjacent points.And theγpower transformation improves the degree of differentiation between the potential clustering centers and the non-clustering centers.Decision function is used to determine the potential clustering centers.Then,the mean of distance between the potential clustering centers adaptive to determine the real clustering centers.The allocation strategy of non-clustering center points is optimized.Through experiments on UCI and synthetic datasets,the algorithm can select the clustering centers adaptively and improve the clustering performance to some extent.
作者
徐童童
解滨
张喜梅
张春昊
XU Tongtong;XIE Bin;ZHANG Ximei;ZHANG Chunhao(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network and Information Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data Security,Hebei Normal University,Shijiazhuang 050024,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第21期91-101,共11页
Computer Engineering and Applications
基金
国家自然科学基金(62076088)
河北师范大学技术创新基金(L2020K09)。
关键词
密度峰值聚类
共享近邻
斜率差分
自适应
决策函数
density peak clustering
shared neighbors
slope difference
adaptive
decision function