摘要
密度峰值聚类算法(DPC)具有准确率高、自动检测类别个数、识别中心点数目等优良性质.由于DPC算法用欧氏距离度量样本点之间的邻近关系,导致无法有效地提取高维复杂数据中的流形结构信息.针对密度峰值聚类算法的这个瑕疵,考虑到数据点之间的几何特性和流形结构,以测地距离替代欧氏距离,设计了一种改进的密度峰值聚类算法.数值模拟结果显示,改进的密度峰值聚类算法能够有效地处理具有流形分布特征的数据聚类问题.
The density peak clustering algorithm(DPC)has excellent properties such as high accuracy,automatic detection of the number of categories,and identification of the number of center points.However,because the DPC algorithm measures the proximity between sample points with Euclidean distance,it is impossible to efficiently extract manifold structure information in high-dimensional complex data.Aiming at this flaw in the density peak clustering algorithm,an improved density peak clustering algorithm is designed to replace the Euclidean distance with the geodesic distance considering the geometric characteristics and manifold structure between the data points.The numerical simulation results show that the improved density peak clustering algorithm can effectively handle the data clustering problem with manifold distribution characteristics.
作者
陈羽
Chen Yu(College of Big Data and Statistics,Anhui University,Hefei,Anhui 230601,China)
出处
《伊犁师范大学学报(自然科学版)》
2023年第1期56-65,共10页
Journal of Yili Normal University:Natural Science Edition
关键词
密度峰值聚类算法
测地距离
共享近邻
流形结构信息
高维聚类
density peak clustering algorithm
geodesic distance
shared neighbor
manifold structure information
highdimensional clustering