摘要
针对密度峰值聚类算法在面对复杂结构数据集时容易出现分配错误的问题,提出一种优化分配策略的密度峰值聚类算法(ODPC)。新算法首先引入参数积γ,扩大了聚类中心的选取范围;然后使用改进的数据点分配策略,对数据集的数据点进行基于相似度指标MS的重新分配,进一步优化了簇类中点集的分配;最后使用dc近邻法优化识别数据集的噪声点。在人工数据集及UCI真实数据集上的实验均可证明,新算法能够在优化噪声识别的同时,提高复杂流形数据集中数据点分配的正确率,并取得比DPC算法、DenPEHC算法、GDPC算法更好的聚类效果。
Focused on the issue that density peaks clustering algorithm will make mistakes when facing data sets allocation with complex structures, a kind of density peaks clustering with optimized allocation strategy(ODPC) is proposed in this paper. Firstly, the parameter product γ is introduced into the new algorithm to expand the selection of cluster centers. Then, it proposes an improved allocation strategy for data points, which redistributes points of data sets with similarity index MS, and further optimizes the allocation of points. Finally, dcnearest neighbor method is used to optimally identify the noise points of data sets. The experiments on artificial and UCI real data sets show that the new algorithm can improve the accuracy of complex manifold data sets allocation while optimizing noise recognition, and achieves better clustering results than DPC(clustering by fast search and find of density peaks), DenPEHC(density peak based efficient hierarchical clustering) and GDPC(density peaks clustering algorithm with gird-division strategy) algorithms.
作者
丁志成
葛洪伟
DING Zhicheng;GE Hongwei(Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi,Jiangsu 214122,China;School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机科学与探索》
CSCD
北大核心
2020年第5期792-802,共11页
Journal of Frontiers of Computer Science and Technology
基金
江苏省普通高校研究生科研创新计划项目No.KYLX16_0781
江苏省高校优势学科建设工程项目。