摘要
密度峰值聚类(DPC)算法作为一种基于密度的聚类算法,因其简单高效而得到广泛应用,但DPC算法易将一个高密度类簇划分为多个类簇且极易产生分配连带错误。对此,提出了基于加权核密度估计与微簇合并的密度峰值聚类算法(WEMCM-DPC),利用核密度估计和加权K近邻重新定义局部密度,缩小高密度类簇和稀疏类簇的局部密度差异,使类簇中心的识别更加准确;提出了新的微簇间相似性度量准则,减少数据集中过于稀疏或密集样本对其他样本的影响,为微簇合并提供了依据,并且改善了DPC算法的分配连带错误,使聚类结果更加准确。密度分布不均数据集和真实数据集的实验结果表明,WEMCM-DPC算法的聚类结果优于DPC和4个改进算法。
The density peaks clustering(DPC)algorithm is a widely used density-based clustering algorithm because of its simplicity and efficiency.However,although the DPC algorithm can easily divide a high-density cluster into multiple clusters,it is very easy to generate assignment linkage errors.In this regard,we propose a DPC algorithm based on weighted kernel density estimation and microcluster merging(WEMCM-DPC)that redefines the local density using kernel density estimation and weighted K-nearest neighbors and reduces high-density clusters.The local density difference of sparse clusters improves cluster center identification.A new similarity measure between microclusters is proposed that can reduce the influence of too sparse or dense samples in data on other samples,provide a basis for the merging of microclusters and improving the allocation error of the DPC algorithm,and improve accuracy of the clustering results.The WEMCM-DPC algorithm has been found to outperform the DPC and the four improved algorithms in clustering performance,as demonstrated by experimental data on datasets with uneven density distributions and real datasets.
作者
李智冈
吕莉
谭德坤
康平
樊棠怀
LI Zhigang;LYU Li;TAN Dekun;KANG Ping;FAN Tanghuai(School of Information Engineering,Nanchang Institute of Technology,Nanchang 330099,China;Nanchang Key Laboratory of IoT Perception and Collaborative Computing for Smart City,Nanchang Institute of Technology,Nanchang 330099,China)
出处
《信息与控制》
CSCD
北大核心
2024年第3期302-314,共13页
Information and Control
基金
国家自然科学基金项目(62066030)
江西省教育厅科技项目(GJJ201915,GJJ2201803)
江西省重点研发计划项目(20192BBE50076,20203BBGL73225)
关键词
密度峰值
聚类
核密度估计
K近邻
微簇合并
density peaks
clustering
kernel density estimation
K-nearest neighbor
micro-cluster merging