期刊文献+

基于MapReduce与优化布谷鸟算法的并行密度聚类算法

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
原文传递
导出
摘要 针对并行化密度聚类的过程中,不同密度聚类簇边界点划分模糊,并且存在数据噪声,从而影响聚类性能,使聚类结果受制于局部最优影响的问题,提出一种基于MapReduce与优化布谷鸟算法的并行密度聚类算法。首先,该算法结合K-means中的近邻与逆近邻思路的策略KDBSCAN(K-means DBSCAN),通过计算各数据点的影响空间,以此重新定义基于密度的聚类(Density-based spatial dutering of apptications with noise,DBSCAN)算法中聚类簇的拓展条件,避免了不同密度聚类簇边界点划分模糊的问题;其次,结合KDBSCAN密度聚类中的近邻思想提出了一种可行的迭代性噪声点处理策略,减轻数据中噪声点对于聚类算法性能的影响;再次,提出基于传统布谷鸟算法的优化改进策略MCS(Majorization cuckoo search),通过衰减发现巢穴概率的权重,随着迭代搜寻次数的增加提升算法收敛速度,解决了聚类结果受制于局部最优的问题;最后,结合MapReduce提出了并行密度聚类策略MCS-KDBSCAN,通过并行化密度聚类算法运算,减轻了并行聚类算法局部最优解传递的通信负担,提升了算法性能。实验证明,提出的MCS-KDBSCAN并行化密度聚类算法在聚类精度、聚类运行时间等方面均较优。 In the process of parallel density clustering,the boundary points of clusters with different densities are divided fuzzy and there is data noise,which affects the clustering performance and makes the clustering results subject to the influence of local optimization.Therefore,a parallel density clustering algorithm MCS-KDBSCAN(maprule based parallel maximization cuckoo search K-means DBSCAN)based on MapReduce and optimized cuckoo algorithm is proposed.Firstly,the algorithm combines the strategy KDBSCAN(K-means DBSCAN),which is based on the idea of nearest neighbor and inverse nearest neighbor in k-means.By calculating the influence space of each data point,the expansion conditions of clustering clusters in DBSCAN algorithm are redefined to avoid the problem of fuzzy boundary points of clustering clusters with different densities;Then,combined with the nearest neighbor idea in KDBSCAN density clustering,a feasible iterative noise point processing strategy is proposed to reduce the impact of noise points in data on the performance of clustering algorithm;Secondly,the optimization and improvement strategy MCS(maximization cuckoo search)based on the traditional cuckoo algorithm is proposed.By attenuating the weight of the probability of finding nests,with the increase of the number of iterative searches,the convergence speed of the algorithm is improved,and the influence of local optimization on the clustering results is solved;Finally,combined with MapReduce,a parallel density clustering strategy MCS-KDBSCAN is proposed.By parallelizing the operation of density clustering algorithm,the communication burden of local optimal solution transmission of parallel clustering algorithm is reduced and the performance of the algorithm is improved.Experiments show that the proposed mcskdbscan parallel density clustering algorithm is superior in clustering accuracy and clustering running time.
作者 毛伊敏 顾森晴 MAO Yi-min;GU Sen-qing(College of Information Engineering,Jiangxi University of Science and Technology,Ganzhou 341000,China)
出处 《吉林大学学报(工学版)》 EI CAS CSCD 北大核心 2023年第10期2909-2916,共8页 Journal of Jilin University:Engineering and Technology Edition
基金 国家重点研发计划项目(2018YFC1504705) 国家自然科学基金项目(41562019)。
关键词 密度聚类 优化布谷鸟算法 基于密度的聚类算法 MAPREDUCE 抗噪能力 density clustering optimization cuckoo algorithm density-based spatial dutering of apptications with noise MapReduce resist noise ability
  • 相关文献

参考文献2

二级参考文献5

共引文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部