摘要
对并行图聚类算法进行了研究。基于Spark提出了一个新的并行图聚类算法;由于Spark中的top操作需要耗费大量的内存,提出了一个新算法来替代top操作,有效减少了所消耗的内存;通过对自底向上的层次聚类算法进行改进提高了聚类的速度;基于图数据的特征提出了一种图数据过滤的方法来减少算法运行的时间以及所占用的空间并对其有效性进行了说明。仿真结果表明,运行效果优于进行比较的其他并行化图聚类算法。
The parallelized graph clustering algorithm is researched.A new parallelized graph clustering algorithm is proposed based on Spark.As the top operation of Spark occupies a lot of memory space,a new algorithm which is used to substitute the top operation is proposed to reduce the memory consumption.By improving bottom up hierarchical clustering algorithm,the speed of the proposed algorithm is improved.A new data filtering method based on the feature of graph data is proposed.By the method,the running time and memory space comsuption is reduced greatly.The reason of the high efficiency of this filtering method is explained.Simulation result indicates that the proposed algorithm is better than other parallelized graph clustering algorithms.
作者
刘东江
黎建辉
Liu Dongjiang;Li Jianhui(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China)
出处
《系统仿真学报》
CAS
CSCD
北大核心
2020年第6期1038-1050,共13页
Journal of System Simulation
基金
国家重点研发计划(2016YFB1000600)
中国科学院战略性先导科技专项(XDA06010307)。
关键词
图聚类
图数据
SPARK
算法
并行化
graph clustering
graph data
Spark
algorithm
parallelize