期刊文献+

分布式多维大图迭代计算性能优化方法

Optimization Methods for Distributed Iterative Computing Performance over Multi-Dimensional Large Graph
下载PDF
导出
摘要 大规模图的复杂挖掘算法通常需要高频迭代分析,而在计算与存储方面扩展性良好的分布式计算是提高处理效率的有效方案.然而,图顶点之间存在自由分布的边关系,会在分布式计算任务之间产生大量消息,由此在迭代过程中产生的巨大通信开销严重制约性能收益.已有工作在传统消息推送框架下采用合并和备份等技术降低通信代价,但主要面向结构简单、易优化的单维消息类算法,并不适用于结构复杂的多维消息类算法,也难以与当前最先进的消息按需拉取框架兼容.因此提出一种新型轻量级顶点备份机制,通过备份顶点的按需同步以及本地消息的按需生成,可完美继承拉取框架在容错和内存管控等方面的系统优势,同时显著降低通信代价.此外,通过考虑通信收益与负载偏斜代价,可计算最优阈值以提高整体性能.最后在大量真实数据集上验证了相关技术的有效性. Complex mining algorithms over large-scale graphs usually require highly frequent iterative analysis,while distributed computing with scalable computing power and storage capacity is a preferred solution for efficiency.However,the edges freely connecting vertices generate a lot of messages across distributed computing tasks and hence incurs communication costs during iterations,which heavily limits the performance improvement of distributed computing.To alleviate the negative impact of communication bottleneck,existing research basically employs combining and replicating techniques on top of the traditional pushing framework design.However,they mainly focus on easy-to-be-optimized single-dimensional-message algorithms with simple message data structure,and are not suitable for other important multi-dimensional-message algorithms with complex structure.Also,they cannot be seamlessly integrated into the-state-of-the-art pulling framework where messages are generated on demand.We thereby propose a lightweight vertex replication mechanism to synchronize replicated vertices and generate messages based on such replications on demand.The mechanism can work well under the pulling framework with inherent advantages in terms of fault-tolerance and memory consumption,and also greatly optimize the communication costs.Moreover,by considering the communication benefits and the costs incurred by possible workload imbalance,it can select an optimal replication threshold for best performance.Finally,extensive experiments over various real-world graphs validate the effectiveness of the lightweight vertex replication framework and the threshold analysis model.
作者 杜玉洁 王志刚 王宁 刘芯亦 衣军成 聂婕 魏志强 谷峪 于戈 Du Yujie;Wang Zhigang;Wang Ning;Liu Xinyi;Yi Juncheng;Nie Jie;Wei Zhiqiang;Gu Yu;Yu Ge(School of Computer Science and Technology,Ocean University of China,Qingdao,Shandong 266100;Key Lab of Cryptologic Technology and Information Security(Shandong University),Ministry of Education,Qingdao,Shandong 266237;Qingdao Big Data Center,Qingdao,Shandong 266071;School of Computer Science and Engineering,Northeastern University,Shenyang 110819)
出处 《计算机研究与发展》 EI CSCD 北大核心 2023年第3期654-675,共22页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61902366,61902365,62072083) 中央高校基本科研业务费专项资金(202042008) 山东大学密码技术与信息安全教育部重点实验室开放课题 中国博士后科学基金项目(2020T130623) 青岛市自主创新重大专项(20-3-2-12-xx) 中国海洋大学计算机系研究生专业发展基金项目(CSZS2021003)。
关键词 分布式图迭代计算 多维消息图算法 通信优化 顶点备份 负载不均衡 distributed graph iterative computing graph algorithms with multi-dimensional messages communication optimization vertex replication workload imbalance
  • 相关文献

参考文献5

二级参考文献63

  • 1李先通,李建中,高宏.一种高效频繁子图挖掘算法[J].软件学报,2007,18(10):2469-2480. 被引量:35
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters [J]. Communications of the ACM, 2008, 51(1) 107-113.
  • 3怀特汤姆.Hadoop权威指南.周敏奇,王晓玲,金澈清,等译.2版.北京:清华大学出版社,2011.
  • 4Pace M F. Hama vs MapReduce [EB/OL]. [2012-06-25]. http://arxiv, org/abs/1203. 2081.
  • 5Kyrola A, Blelloch G, Guestrin C. GraphChi: Large-scale graph computation on just a PC [C] //Proc o the 10th USENIX Conf on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2012: 31- 46.
  • 6Grzegorz M, Austern H M, et al. Pregel: A system for large-scale graph processing [C]//Proc of the ACM SIGMOD 2010 Int Conf on Management of Data. New York: ACM, 2010: 135-146.
  • 7Chen Ruishan, Weng Xuetian, He Bingshen, et al. Large graph processing in the cloud [C] /[Proc o 2010 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2010:1123-1126.
  • 8Sangwon S, Edward J Y. HAMA An efficient matrix computation with the MapReduce framework [C] //Proc of the 2nd IEEE Int Conf on Cloud Computing Technology and Science. Los Alamitos, CA: IEEE Computer Society, 2010: 721-726.
  • 9The Apache Software Foundation. Introduction to Giraph [EB/OL]. [2013-06-25]. http//giraph, apache, org/intro. html.
  • 10冯国栋,肖仰华.大图的分布式存储[J].中国计算机学会通讯,2012,8(11):11-8.

共引文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部