摘要
流形数据由一些弧线形类簇组成,其特点是同一类簇的样本间距离较大.密度峰值聚类(DPC)算法具有简单高效的特点,但应对流形数据时表现不佳. DPC算法的两种密度度量标准可能造成不同程度的信息缺失,其分配策略仅参考距离和密度,致使聚类精度不高.提出面向流形数据的加权自然近邻DPC(DPC-WNNN)算法,定义样本局部密度时,综合分析样本的局部和全局信息,引入加权的自然近邻以及逆近邻来应对高斯核或截断核的信息缺失问题.设计样本分配策略时通过引入共享近邻和共享逆近邻计算样本相似度,弥补DPC算法空间因素缺失的问题.将DPC-WNNN算法在流形数据集和真实数据集上与7种类似算法进行比较,结果表明该算法能更有效地找到类簇的中心点并准确分配样本,表现出良好的聚类性能.
Manifold data is composed of several clusters,each with a distinctive arc shape.Samples of the same cluster are characterized by large distances between them.The density peaks clustering(DPC)algorithm is simple and efficient,but it does not perform well when dealing with manifold data for the following reasons:the two-density metrics of the algorithm may result in different degrees of missing information,and its allocation strategy only considers distance and density factors,which can lead to poor clustering accuracy.We proposed a DPC based on weighted natural nearest neighbors for manifold datasets(DPC-WNNN)algorithm to address the above issues.DPC-WNNN comprehensively analyzed the local and global information of the sample when designing the definition of local density,and intro-duced weighted natural nearest neighbors and inverse nearest neighbors to address the problem of miss-ing information in Gaussian or cutoff kernels.The sample assignment was calculated by introducing the idea of shared reverse nearest neighbors and shared nearest neighbors to compensate for the lack of spa-tial factors in the original algorithm.The experimental results were compared with the seven algorithms in the manifold and real datasets,and show that the DPC-WNNN algorithm can find the center of clus-ters more effectively and assign samples accurately,which shows excellent clustering performance.
作者
赵嘉
马清
陈蔚昌
肖人彬
崔志华
潘正祥
ZHAO Jia;MA Qing;CHEN Wei-chang;XIAO Ren-bin;CUI Zhi-hua;PAN Jeng-shyang(Key Laboratory of IoT Perception and Collaborative Computing for Smart City of Nanchang,School of Information Engineering,Nanchang Institute of Technology,Nanchang 330000,China;School of Artificial Intelligence and Automation,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China;College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,Shandong,China)
出处
《兰州大学学报(自然科学版)》
CAS
CSCD
北大核心
2024年第5期652-660,669,共10页
Journal of Lanzhou University(Natural Sciences)
基金
国家自然科学基金项目(52069014,62466037)。
关键词
密度峰值
聚类
流形数据
自然近邻
density peak
clustering
manifold data
natural neighbor