期刊文献+

基于边界剥离思想的全局中心聚类算法

Border-peeling Inspired Globally Central Clustering Algorithm
下载PDF
导出
摘要 全局中心聚类算法如k-means、谱聚类在类簇分布出现重叠粘连现象时往往容易陷入局部最优且参数难以设定,极大地限制了全局中心聚类算法在实际应用中的效果。为解决此问题,提出了一种基于边界剥离思想的全局中心聚类算法。首先,设计了一步边界剥离法,根据样本点间的反向k近邻关系定义了一种局部距离加权密度,并利用密度经验分布函数一阶差分最大处的密度值作为阈值将数据集分为边界集与核心集。其次,嵌入传统的全局中心聚类算法对核心集进行聚类,得益于核心集的簇间重叠问题已明显改善,嵌入算法将更容易收敛到真实的簇中心。最后,提出一种边界吸引算法,从已被归类的核心集样本点出发,借助已有的反向k近邻关系迭代融合边界集中的样本点以完成对整个数据集的聚类。相较于目前以迭代方式进行的边界剥离算法,所提算法在计算效率上具有明显优势,不需要额外设定复杂的终止条件而直接通过阈值进行边界划分,并且全局性方法在数据局部密度存在差异的情形下具备更强的鲁棒性。在实验阶段,采用3个合成数据集以及6个真实数据集从算法性能、参数敏感性、时间消耗多个方面进行评估,实验结果进一步验证了此算法的有效性与实用性。 The globally central clustering algorithms,such as k-means and spectral clustering,often suffer from the problem of local optima and difficulty in parameter setting with overlapping and adhesive clusters in the data distribution,which might greatly limits the effectiveness of globally central clustering algorithms in practical applications.To address this issue,a border-peeling inspired globally central clustering algorithm was proposed.Firstly,a one-step border peeling method was designed,which defines a locally distance-weighted density according to the reverse k-nearest neighbor relationships between sample points.The density value at the maximal point of the first-order difference of the density empirical distribution function was utilized as the threshold to divide the dataset into boundary and core sets.Then,the traditional globally central clustering algorithms were embedded to cluster the core set.Benefiting from the significant improvement in the overlapping of the core set,the embedding algorithms could converge to the true cluster centers easily.Finally,a boundary attraction algorithm was proposed,which could progressively amalgamate sample points from the boundary set,utilizing existing reverse k-nearest neighbor relationships,and commencing from the already categorized core set sample points.Compared with the currently iterative border peeling algorithms,the proposed algorithm had significant advantages in computational efficiency.There was no additional complex termination conditions but only direct performs boundary partitioning using a threshold.Furthermore,the global approach also exhibited stronger robustness local data densities were different.In the experimental phase,three synthetic datasets and six real-world datasets were used to evaluate the algorithm′s performance,parameter sensitivity,and time consumption,further validating the efficacy and practicality of this algorithm.
作者 程明畅 敖兰 刘浏 CHENG Mingchang;AO Lan;LIU Liu(V.C.&V.R.Key Lab of Sichuan Provence,Sichuan Normal University,Chengdu 610066,China;School of Mathematical Sci-ences,Sichuan Normal University,Chengdu 610066,China;Geomathematics Key Laboratory of Sichuan Province,Chengdu Uni-versity of Technology,Chengdu 610059,China;College of Mathematics and Physics,Chengdu University of Technology,Chengdu 610059,China)
出处 《郑州大学学报(工学版)》 CAS 北大核心 2024年第5期86-94,共9页 Journal of Zhengzhou University(Engineering Science)
基金 国家自然科学基金资助项目(12075162) 数学地质四川省重点实验室开放基金资助(scsxdz2023-4) 四川师范大学学科建设专项(XKZX2021-04)。
关键词 全局中心聚类算法 边界剥离 簇重叠 反向k近邻 经验分布 globally central clustering algorithm border peeling overlapping reverse k-nearest neighbors empirical distribution
  • 相关文献

参考文献2

二级参考文献10

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部