期刊文献+

面向大规模二部图的分布式Tip分解算法 被引量:1

Distributed Algorithm for Tip Decomposition on Large Bipartite Graphs
下载PDF
导出
摘要 Tip分解作为图数据管理领域的热点研究问题,已被广泛应用于文档聚类和垃圾邮件组检测等实际场景中.随着图数据规模的爆炸式增长,单机内存已无法满足其存储需求,亟需研究分布式环境下Tip分解技术.现有分布式图计算系统的通信模式无法适用于二部图,为此,首先提出一种基于中继的通信模式,以实现分布式环境下处理二部图时消息的有效传递;其次,提出分布式butterfly计数算法(DBC)和tip分解算法(DTD),特别地,为解决处理大规模二部图时DBC面临的内存溢出问题,提出了一种可控的并行顶点激活策略;最后,引入基于顶点优先级的消息剪枝策略和消息有效性剪枝策略,通过减少冗余通信和计算开销,进一步提高算法效率.实验平台部署于国家超算中心高性能分布式集群上,在多个真实数据集上的实验结果验证了所提算法的有效性和高效性. Tip decomposition has a pivotal role in mining cohesive subgraph in bipartite graphs.It is a popular research topic with wide applications in document clustering,spam group detection,and analysis of affiliation networks.With the explosive growth of bipartite graph data scale in these scenarios,it is necessary to use distributed method to realize its effective storage.For this reason,this work studies the problem of tip decomposition on a bipartite graph in the distributed environment for the first time.Firstly,a new relay based communication mode is proposed to realize effective message transmission when the given bipartite graph is decomposed in distributed environment.Secondly,the distributed butterfly counting algorithm(DBC)and tip decomposition algorithm(DTD)are designed.In particular,a controllable parallel vertex activation strategy is proposed to solve the problem of memory overflow when DBC decomposes large-scale bipartite graphs.Finally,the message pruning strategy based on vertex priority and message validity pruning strategy are introduced to further improve the efficiency of the algorithm by reducing redundant communication and computing overhead.The experiment is deployed on the high performance distributed computing platform of National Supercomputing Center.The effectiveness and efficiency of the proposed algorithms are verified by experiments on several real datasets.
作者 周旭 翁同峰 杨志邦 李博仁 张吉 李肯立 ZHOU Xu;WENG Tong-Feng;YANG Zhi-Bang;LI Bo-Ren;ZHANG Ji;LI Ken-Li(College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;National Supercomputing Center in Changsha,Changsha 410082,China;Zhejiang Lab,Hangzhou 311100,China)
出处 《软件学报》 EI CSCD 北大核心 2022年第3期1043-1056,共14页 Journal of Software
基金 国家自然科学基金(61772182,61802032,69189338,62172146,62172157) 之江实验室开放课题(2021KD0AB02) 国防科技大学信息系统工程重点实验室基金。
关键词 二部图 butterfly计数 分布式系统 tip分解 bipartite graph butterfly counting distributed system tip decomposition
  • 相关文献

参考文献4

二级参考文献83

  • 1Amazon SimpleDB. http://aws, amazon, com/simpledb/, 2011-8-10.
  • 2Connor Alexander G, Chrysanthis Panos K, Labrinidis Alexandros. Key key-value stores for efficiently processing graph data in the cloud//Proceedings of the GDM. Hannover, Germany, 2011:88-93.
  • 3Lordanov Borislav. HyperGraphDB: A generalized graph database//Proceedings of the IWGD. JiuZhai Valley, China, 2010:25-36.
  • 4Eifrem Emil. NOSQL: Scaling to size and scaling to complexity, http://blogs, neotechnology, com/emil/2009/11/ nosql-scaling tosize-and-scaling-to-complexity, html, 2009- 1-15.
  • 5Wu Sai, Jiang Da-Wei, Ooi Beng Chin et al. Efficient B-tree based indexing for cloud data proeessing//Proeeedings of the VLDB. Singapore, 2010: 1207-1218.
  • 6Wang Jin-Bao, Wu Sai, Gao Hong et al. Indexing multi dimensional data in a cloud system//Proceedings of the SIGMOD. Indianapolis, Indiana, USA, 2010: 591-602.
  • 7Tsatsanifos George, Sacharidis Dimitris, Sellis Timos et al. MIDAS: Multi-attribute indexing for distributed architecture systems//Proceedings of the SSTD. Minneapolis, MN, USA, 2011:168-185.
  • 8Aguilera M K, Golab W, Shah M A. A practical scalable distributed B-tree//Proceedings of the VLDB. Auckland, New Zealand, 2008: 598-609.
  • 9Zhang Xiang-Yu, Ai Jing, Wang Zhong-Yuan, Lu Jia-Heng et al. An efficient multi-dimensional index for cloud data management//Proceedings of the CloudDB. Hong Kong, China, 2009:17-24.
  • 10InfiniteGraph, the Distributed Graph Database. http:// www. infinitegraph, com/, 2011 -7 -29.

共引文献103

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部