期刊文献+

最小化多MapReduce任务总完工时间的分析模型及其应用

An analytical model and its applications for minimizing total makespan of multiple MapReduce jobs
下载PDF
导出
摘要 随着大规模的MapReduce集群广泛地用于大数据处理,特别是当有多个任务需要使用同一个Hadoop集群时,一个关键问题是如何最大限度地减少集群的工作时间,提高MapReduce作业的服务效率。可将多个MapReduce作业当做一个调度任务建模,观察发现多个任务的总完工时间和任务的执行顺序有密切关系。研究目标是设计作业调度系统分析模型,最小化一批MapReduce作业的总完工时间。提出一个更好的调度策略和实现方法,使整个调度系统符合经典Johnson算法的条件,从而可使用经典Johnson算法在线性时间内获取总完工时间的最优解。同时,针对需要使用两个或多个资源池进行平衡的问题,提出了一种线性时间解决方案,优于已知的近似模拟方案。该理论模型可应用于提高系统响应速度、节能和负载均衡等方面,对应的应用实例提供了证实。 As large-scale MapReduce clusters become widely adapted to process huge amount of data, one of critical challenges is to improve the service quality of MapReduce clusters by minimizing their makespan. A scheduling model can be considered for multiple MapReduce jobs. It is observed that the order in which these jobs are executed can have a significant impact on their overall makespan. The goal of the paper is to design a framework of automatic job scheduler and propose an analytical model for minimizing the makespan of such a set of MapReduce jobs. By considering a better strategy and implementa- tion, we can meet the conditions of the classical Johnson algorithm and use it to find the optimal solu- tion. Under our proposed new strategy, solving the balanced pools problem becomes exact in linear time, better than existing simulating approaches. Our proposed analytical results can be applied to improve system response time, energy-efficiency and load-balance in Hadoop cluster pools, while corresponding numerical examples validate our observations.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第4期571-578,共8页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61150110486 61272528) 中央高校基金资助项目(ID-ZYGX2013J073) 2013年CCF-腾讯科研基金资助项目
关键词 HADOOP MAPREDUCE 批量作业 调度优化 最小化总完工时间 Hadoop MapReduce batch workloads optimized schedule minimized makespan
  • 相关文献

参考文献16

  • 1http: // hadoop.apache. org/common/docs/ro. 20. 1/capacityscheduler, htm.
  • 2Zaharia M, Borthakur D, Sarma J S, et al. Delay schedu- lingua simple technique for achieving locality and fairness in cluster scheduling [C]//Proc of EuroSys, 2010:265 -278.
  • 3Wolf J, Rajan D, Hildrum K, et al. FLEX:A slot allocation scheduling optimizer for MapReduce workloads[C]//Proc of ACM/IFIP/USENIX International Middleware Conference, 2010 :1-20.
  • 4Verma A, Cherkasova L, Campbell R H. ARIA: Automatic resource inference and allocation for MapReduce environ- ments[C]//Proc of ICAC' 11, 2011:235-244.
  • 5Verma A, Cherkasova L, Campbell R H. Play it again, Sim MRI [C] // Proc of Intl. IEEECluster'11, 2011:253-26l.
  • 6Verma A, Cherkasova L, Campbell R H. Orchestrating an ensemble of MapReduce jobs for minimizing their makespan . IEEE Transactions on Dependable and Secure Compu ting, 2013,10(5):314-327.
  • 7Tian Wen-hong, Yeo C S, Xue Rui ni, et al. Power aware scheduling of real time virtual machines in cloud data centers considering fixed processing intervals[C]//Proc of IEEE CCIS' 12,2012: 337-341.
  • 8Tian Wen-hong, Xue Rui-ni, Xiong Qing, et al. An energy efficient online parallel scheduling algorithm for cloud data centers[C]//Proc of IEEE Services 2013,2013: 1.
  • 9Tian Wen-hong, Zhao Yong, Zhong Yuan-liang, et al. Dy- namic and integrated load-balancing scheduling algorithms for cloud data centers. Journal of China Communications, 2011, 8(6):117-126.
  • 10Herodotou H, Babu S. Profiling, what if analysis, and cost based optimization of MapReduee programs[J]. Proc of the VLDB Endowment, 2011,4 (11): 1111-1122.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部