摘要
对MapReduce工作流的优化主要是通过对MapReduce栈的优化实现的。针对MapReduce工作流的优化问题,首先,提出相关概念;其次,介绍MapReduce工作流基于成本的优化过程;然后,通过实例阐述MapReduce工作流中的数据流依赖和资源依赖关系。基于此,提出3种MapReduce工作流优化器,并对其进行端对端的评估。最后,通过实验评估工作流优化器的优化开销并对比分析了这3种工作流优化器。
The optimisation on MapReduce workflow is mainly achieved by optimising the MapReduce stack.For the problem of MapReduce workload optimisation,first we present the related concept;secondly,we introduce the cost-based optimisation process of MapReduce workflow;then we expound through examples the relation of dataflow dependency and resource dependency in MapReduce workflow.Based on this,we present three MapReduce workflow optimisers and carry out end-to-end evaluation on them.Finally,we evaluate the optimisation overheads of workflow optimisers through experiment and comparatively analyse all three of them.
出处
《计算机应用与软件》
CSCD
2015年第10期54-58,85,共6页
Computer Applications and Software