摘要
随着网络规模的快速增长,传统社区发现算法难以处理大规模网络数据和满足复杂网络的可扩展分析需求.本文提出一种适用于大规模复杂网络的重叠社区发现算法PHLink.该算法根据复杂网络的无标度特性将节点建立连边的原因进行分析和归类,用以识别网络中具有重叠性的社区结构,并采用MapReduce计算框架对网络进行分割和冗余存储,减弱了图计算的耦合性,解决了社区发现算法的分布式计算问题.通过真实网络测试,PHLink算法可以大幅度降低边计算的复杂度,对于无标度特性明显的复杂网络提取0.1%的枢纽节点即可节省94%以上的计算量,较传统算法具有较高的稳定性和准确性,并且在Hadoop平台有良好的加速性和伸缩性,可以处理千万级连边规模的大规模复杂网络.
With the increasing size of complex networks, traditional detection methods are difficult to scale to larger networks. This paper proposes a parallel hierarchical link( PHLink) algorithm to discover overlapping communities. By studying the power-law degree distribution of complex networks, we distinguish the motivations of why two nodes intend to establish a connection so that the complexity for community detection can be reduced. PHLink is implemented using distributed computing for large-scale networks via graph segmentation and redundant storage. Experiments validate that PHLink can accurately discover network communities and the overlaps between them. For scale-free networks,removing 0.1% of the hub nodes reduces the computation time by 94%. Meanwhile,PHLink demonstrates good speed-up and scale-up on Hadoop,which can scale to very large complex networks with millions of edges.
作者
滕飞
戴荣杰
任晓春
TENG Fei;DAI Rongjie;REN Xiaochun(School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China;State Key Laboratory of Rail Transit Engineering Informatization(FSDI),Xi’an 710043,China)
出处
《西南交通大学学报》
EI
CSCD
北大核心
2019年第1期211-218,共8页
Journal of Southwest Jiaotong University
基金
国家自然科学基金资助项目(61573292
61572407)
四川省软科学研究计划资助项目(2016ZR0034)
关键词
复杂网络
重叠社区
社区发现
无标度
并行算法
HADOOP
complex network
overlapping community
community detection
scale-free
parallel algorithm
Hadoop