摘要
作为大数据处理的一种重要模式,图处理被广泛地应用在机器学习、数据统计和数据挖掘等场景中。在企业级应用中,多种类型的大数据处理框架通常会部署在同一个分布式集群中,其运行环境是开放、共享的,这时图处理需要考虑运算资源动态变化的问题。为了能适应这种动态性,更加充分地利用开放共享环境的资源,图处理框架应该具备弹性伸缩能力。通过调研,发现现有的图处理框架尚未完全实现弹性伸缩。为此,介绍了一种支持弹性伸缩的分布式并行图处理框架SPar Ta G。首先基于任务并行模型定义了图处理任务集及任务模型;其次基于任务迁移机制设计并实现了可动态伸缩的图处理框架;最后设计了一个基于负载均衡的调度算法,实现了动态伸缩的图处理过程。实验结果说明,SPar Ta G的性能与当前流行的开源图处理框架Giraph相近,且具有较好的弹性伸缩能力。
As an important pattern in big data processing, graph processing has been widely used in many kinds of scenarios, such as machine learning, data statistics and data mining, etc. when running enterprise-level applications, various kinds of big-data processing frameworks are usually deployed in the same distributed cluster, so the runtime environment is open and shared. As a result, graph processing should consider the dynamic changes of computing resources.In order to adapt to this dynamics and make good use of computing resources, graph processing framework should have the ability of elastic scaling. However, current graph processing frameworks have not fully realized elastic scaling yet as far as this paper knows. This paper introduces the design and implementation of an elastic scalable parallel graph processing framework, SPar Ta G. SPar Ta G firstly defines the task set and task model in graph processing problem; then designs an elastic scalable framework based on task migration mechanism; and proposes a load- balancing based scheduling algorithm at last. Experiments show that SPar Ta G achieves performance parity with the currently popular open-source Giraph system, and it can run graph job well in an elastic scalable manner.
出处
《计算机科学与探索》
CSCD
北大核心
2016年第7期901-914,共14页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金 Nos.61272154
61121063
国家重点基础研究发展计划(973计划) No.2011CB302604
百度云服务开放平台示范项目~~
关键词
图处理
分布式并行计算
弹性伸缩
任务迁移
graph processing
distributed parallel computing
elastic scaling
task migration