摘要
基于密度的聚类是一种经典的聚类分析方法,它能够在不指定类簇数目的情况下发现非球形类簇.但真实复杂数据集中存在类簇边界模糊、数据密度不均、数据分布复杂等问题.当前,能够同时应对这三种问题的研究工作相对较少.对此,本文从自然世界的侵蚀现象中汲取灵感,提出侵蚀聚类(Erosion Clustering,EC)算法.本算法引入动态密度估计方法和侵蚀策略,逐层识别和剔除位于类簇边界上的数据,进而发现各个类簇潜在的核心区域;采用基于互可达图的聚类方法实现核心区域的聚类;设计基于局部密度峰值的分配方式完成边界数据的划分.在12个基准数据集上的实验结果表明,EC算法的聚类性能比7种对比算法分别在修正兰德指标、修正互信息、F1分数上平均提高了96%、53%和36%.
Density-based clustering is a classical algorithm in cluster analysis,which can find non-spherical clusters without specifying the number of clusters in advance.In the real-world scene,there are still some issues,including unclear boundaries between clusters,varying densities of data,and complex cluster shapes.Most existing density-based clustering algorithms do not tackle these problems in a unified way.We counter this difficulty by taking inspiration from the natural erosion phenomenon to present erosion clustering(EC).Firstly,the proposed dynamic density evaluation method is integrated into the erosion strategy,which identifies and removes the data on the cluster boundary layer by layer,revealing the cores of the latent clusters.After that,a mutual-reachability-graph-based clustering is used to group the core data.Finally,the allocation strategy based on the local density peak is designed to associate the eroded data to different clusters.The experimental results on 12 benchmark datasets demonstrate that the clustering performance of the proposed EC algrithm is improved by an average of 96%,53%,and 36%in the adjusted Rand index,adjusted mutual information,and F1 score,respectively,comparing with the other seven algrithms.
作者
杜明晶
吴福玉
李宇蕊
董永权
DU Ming-jing;WU Fu-yu;LI Yu-rui;DONG Yong-quan(School of Computer Science and Technology,Jiangsu Normal University,Xuzhou,Jiangsu 221116,China;Jiangsu Key Laboratory of Educational Intelligent Technology,Jiangsu Normal University,Xuzhou,Jiangsu 221116,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2024年第10期3459-3471,共13页
Acta Electronica Sinica
基金
国家自然科学基金(No.62006104,No.61872168)。
关键词
密度聚类
聚类分析
密度估计
局部密度峰值
互k近邻
侵蚀策略
density-based clustering
cluster analysis
density estimation
local density peak
mutual k-nearest neighbor
erosion strategy