摘要
现有的社区搜索算法难以在网络中找到满足给定复杂属性条件的社区。同时,随着网络规模的不断扩大,单机串行的社区搜索算法也已无法有效地处理大规模的网络数据。针对复杂属性条件下的clique社区搜索问题,提出一种基于Spark的搜索算法。在Spark并行计算框架的基础上,结合图的结构特征和内容属性,根据由布尔表达式定义的复杂属性条件采取不同的搜索策略,搜索时利用属性的搜索成本和扩展成本进行局部优化,从而加快搜索过程。实验结果表明,与结构优先或属性优先的社区搜索算法相比,该算法在不同属性条件、网络规模和节点数目的情况下均能保证搜索准确性并提高搜索效率。
Existing community search algorithms often fail to find the communities that satisfy the given complex attribute conditions in networks.At the same time,single-machine serial community search algorithms are not capable of processing massive network data generated by scaling networks.To address the problem,this paper proposes a Spark-based community search algorithm under complex attribute condition.The algorithm is constructed by using the parallel computing framework of Spark.Based on the structural features and content attributes of the graph,different search strategies are used according to the complex attribute conditions defined by Boolean expressions.The search cost and extension cost of the attribute are used for partial optimization to speed up the search process.Experimental results show that compared with the proposed structure-first community search algorithm and attribute-first community search algorithm,the proposed algorithm displays a higher search efficiency with the accuracy ensured in the cases of different network scales,numbers of nodes,and attribute conditions.
作者
佘鑫
何震瀛
SHE Xin;HE Zhenying(Software School,Fudan University,Shanghai 200441,China;School of Computer Science,Fudan University,Shanghai 200441,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2021年第12期54-61,70,共9页
Computer Engineering
基金
国家重点研发计划“精准公共法律服务支撑技术与装备研究”(2018YFC0830900)。