期刊文献+

基于Spark并行的密度峰值聚类算法 被引量:2

Spark-based parallel density peak clustering algorithm
下载PDF
导出
摘要 针对FSDP聚类算法在计算数据对象的局部密度与最小距离时,由于需要遍历整个数据集而导致算法整体时间复杂度较高的问题,提出了一种基于Spark的并行FSDP聚类算法SFSDP。首先,通过空间网格划分将待聚类数据集划分成多个数据量相对均衡的数据分区;然后,利用改进的FSDP聚类算法并行地对各个分区内的数据执行聚类分析;最后,通过将分区间的局部簇集合并,生成全局簇集。实验结果表明,SFSDP与FSDP算法相比能够有效地进行大规模数据集的聚类分析,并且算法在准确性和扩展性方面都有很好的表现。 In view of the problem that the overall time complexity of the FSDP clustering algorithm was high because the algorithm needed to traverse the entire data set when calculating the local density and minimum distance of data objects,this paper presented a Spark-based parallel FSDP clustering algorithm called SFSDP.First,the algorithm divided the dataset into multiple data partitions with relatively equal size by spatial meshing.Then,it used the improved FSDP clustering algorithm to perform the clustering analysis on the data in each partition parallelly.It generated the global clusters by grouping together local clusters between partitions.Experimental results show that SFSDP algorithm can effectively perform large-scale dataset clustering analysis compared with FSDP algorithm,and the algorithm has a good performance in terms of accuracy and scalability.
作者 孙伟鹏 吴锡生 孟斌 Sun Weipeng;Wu Xisheng;Meng Bin(School of IoT Engineering,Jiangnan University,Wuxi Jiangsu 214122,China;Software Engineering Center,China Ship Scientific Research Center,Wuxi Jiangsu 214082,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第1期163-166,171,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(61672265) 七〇二所青年创新基金资助项目(J775).
关键词 聚类 密度峰值 空间划分 并行 SPARK clustering density peak space division parallel Spark
  • 相关文献

参考文献3

二级参考文献43

  • 1Guha S,Rastogi R,Shim K.CURE:An Efficient Clustering Algorithm for Large Databases[C].Seattle:Proceedings of the ACM SIGMOD Conference,1998.73-84.
  • 2Guha S,Rastogi R,Shim K.ROCK:A Robust Clustering Algorithm for Categorical Attributes[C].Sydney:Proceedings of the 15th ICDE,1999.512-521.
  • 3Karypis G,Han E-H,Kumar V.CHAMELEON:A Hierarchical Clustering Algorithm Using Dynamic Modeling[J].IEEE Computer,1999,32(8):68-75.
  • 4Ester M,Kriegel H-P,Sander J,et al.A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C].Portland:Proceedings of the 2nd ACM SIGKDD,1996.226-231.
  • 5Hinneburg A,Keim D.An Efficient Approach to Clustering Large Multimedia Databases with Noise[C].New York:Proceedings of the 4th ACM SIGKDD,1998.58-65.
  • 6Wang W,Yang J,Muntz R.STING:A Statistical Information Grid Approach to Spatial Data Mining[C].Athens:Proceedings of the 23rd Conference on VLDB,1997.186-195.
  • 7Wang W,Yang J,Muntz R R.STING+:An Approach to Active Spatial Data Mining[C].Sydney:Proceedings of the 15th ICDE,1999.116-125.
  • 8Agrawal R,Gehrke J,Gunopulos D,et al.Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[C].Seattle:Proceedings of the ACM SIGMOD Conference,1998.94-105.
  • 9Sheikholeslami G,Chatterjee S,Zhang A.WaveCluster:A Multireso-lution Clustering Approach for Very Large Spatial Databases[C].New York:Proceedings of the 24th Conference on VLDB,1998.428-439.
  • 10Chris Ding.A Tutorial on Spectral Clustering[C].ICML,2004.

共引文献261

同被引文献19

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部