期刊文献+

自适应聚类中心策略优化的密度峰值聚类算法 被引量:1

Density Peak Clustering Algorithm Optimized by Adaptive Clustering Centers Strategy
下载PDF
导出
摘要 密度峰值聚类算法(DPC)是一种简单高效的无监督聚类算法,能够快速找到聚类中心完成聚类。该算法通过截断距离定义局部密度未考虑样本点的空间分布特征;通过决策图选择聚类中心点,具有较强人为主观性;在分配样本点时采用单一分配策略,易产生连带错误。因此提出一种自适应聚类中心策略优化的密度峰值聚类算法(ADPC),采用共享近邻定义两点之间的相似性度量,重新定义了局部密度,使局部密度反应样本间的空间分布特征;通过相邻点之间斜率差分确定样本密度ρ与相对距离δ的乘积γ值的“拐点”,并对γ进行幂函数变换,以提高潜在聚类中心与非聚类中心的区分度,利用决策函数确定潜在的聚类中心,再通过潜在聚类中心之间距离均值自适应确定真实聚类中心;优化了非聚类中心点的分配策略。通过在UCI以及人工数据集上进行实验,该算法都可以自适应准确选定聚类中心,且在一定程度上提高了聚类性能。 Density peak clustering(DPC)algorithm is a simple and efficient unsupervised clustering algorithm,which can quickly find the clustering centers to complete clustering.However,the local density is defined by truncation distance without considering the spatial distribution characteristics of sample points.Selecting clustering center points by decision graph has strong artificial subjectivity.When using single allocation strategy,it is easy to cause joint error.Therefore,a density peak clustering algorithm optimized by shared nearest neighbors and adaptive clustering centers strategy(ADPC)is proposed.The shared nearest neighbors are used to define the similarity measure between two points,and the local density is redefined so that it reflects the spatial distribution characteristics of samples.Theγvalue is the product of the sample densityρand relative distanceδ.The“inflection point”is determined by slope difference between adjacent points.And theγpower transformation improves the degree of differentiation between the potential clustering centers and the non-clustering centers.Decision function is used to determine the potential clustering centers.Then,the mean of distance between the potential clustering centers adaptive to determine the real clustering centers.The allocation strategy of non-clustering center points is optimized.Through experiments on UCI and synthetic datasets,the algorithm can select the clustering centers adaptively and improve the clustering performance to some extent.
作者 徐童童 解滨 张喜梅 张春昊 XU Tongtong;XIE Bin;ZHANG Ximei;ZHANG Chunhao(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network and Information Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data Security,Hebei Normal University,Shijiazhuang 050024,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第21期91-101,共11页 Computer Engineering and Applications
基金 国家自然科学基金(62076088) 河北师范大学技术创新基金(L2020K09)。
关键词 密度峰值聚类 共享近邻 斜率差分 自适应 决策函数 density peak clustering shared neighbors slope difference adaptive decision function
  • 相关文献

参考文献6

二级参考文献48

  • 1余建桥,张帆.基于数据场改进的PAM聚类算法[J].计算机科学,2005,32(1):165-167. 被引量:15
  • 2淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法[J].电子学报,2006,34(2):258-262. 被引量:83
  • 3Han J W, Kamber M. Data Mining Concepts and Techniques. 2nd ed. New York:Elsevier Inc, 2006. 383-424.
  • 4Jain A K. Data clustering:50 years beyond K-means. Pattern Recogn Lett, 2010, 31:651-666.
  • 5Williamson B, Guyon I. Clustering:science or art?. J Mach Learn Res, 2012, 27:65-80.
  • 6Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315:972-976.
  • 7Rodri?uez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344:1492-1496.
  • 8Xu R, Wunsch D. Survey of clustering algorithms. IEEE Trans Neural Netw Learn Syst, 2005, 16:645-678.
  • 9McQueen J. Some methods for classification and analysis of multivariate observations. In:Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. Los Angeles:University of California, 1967. 281-297.
  • 10Likas A, Vlassis N, Verbeek J J. The global K-means clustering algorithm. Pattern Recogn, 2003, 36:451-464.

共引文献175

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部