期刊文献+

Hadoop中Reduce作业动态调度算法 被引量:1

Dynamic Scheduling Algorithm for Reduce Task in Hadoop
下载PDF
导出
摘要 Hadoop平台中的MapReduce并行分布式编程模型通过将廉价节点组合成集群提供存储和计算服务,可以降低集群成本。Hadoop可以通过配置使Reduce任务在Map任务完成固定百分比时启动,但是过早地启动Reduce任务会造成Reduce资源长期处于等待状态。提出一种Reduce动态调度的DRS算法,通过作业中Map任务数量和大小计算Reduce启动时间,并在作业运行中根据Map任务的调度情况修正启动时间,以节约Reduce资源的使用效率。实验表明,DRS算法与固定百分比参数的方法相比,shuffle阶段时间缩短了7.3%。与系统默认参数相比shuffle阶段时间缩短了43.6%。 By combing the cheap nodes into cluster,MapReduce Parallel distributed programming model in the Hadoop platform provides the storage and computing service,which can reduce the cluster cost dramatically. Hadoop can launch Reduce tasks when the Map task is accomplished at a fixed percentage. However the Reduce resource will always be held in suspense due to premature launch of Reduce tasks. This paper proposes the DRS algorithm on account of Reduce launch time.Moreover,it modifies the launch time in accordance with scheduling status of Map tasks in the job’s operation so as to improve the efficiency of Reduce resources. Experiments demonstrate DRS algorithm shortens the launch time by 7. 3%,when compared with fixed-percentage parameter,and by43. 6% when compared with system default parameter.
出处 《信息工程大学学报》 2016年第1期83-87,96,共6页 Journal of Information Engineering University
基金 国家863计划资助项目(2012AA010905) 国家自然科学基金资助项目(61370081)
关键词 HADOOP MAPREDUCE 动态调度算法 Hadoop MapReduce dynamic scheduling algorithm
  • 相关文献

参考文献19

  • 1Dean J, Ghemawat S. MapReduee: simplified data pro- cessing on large clusters [ J]. Communications of the ACM ,2008 ,51 (1) :107-113.
  • 2Tan J, Meng X, Zhang L. Performance analysis of coupling scheduler for mapreduce/hadoop [ C ]//Proceedings of the INFOCOM. 2012 : 2586-2590.
  • 3Chen F, Kodialam M, Lakshman T. Joint scheduling of processing and shuffle phases in mapreduce systems[J]. Proceedings of the INFOCOM ,2012,131 (5) :43-51.
  • 4Tan J,Meng S, MENG X, et al. Improving ReduceTask data locality for sequential MapReduce jobs [ J ]. Proceed- ings of the INFOCOM ,2013,12( 11 ) :27-35.
  • 5Wang W,Zhu K,Ying L,et al. Map task scheduling in mapreduce with data locality:Throughput and heavy-traf- fic optimality[ C ]//proceedings of the INFOCOM. 2013 : 351-372.
  • 6Tan J,Meng X,Zhang L. Coupling task progress for ma- preduce resource-aware scheduling [ C ]//Proceedings of the INFOCOM. 2013: 1618-1626.
  • 7Chang H, Kodialam M, Kompella R R, et al. Scheduling in mapreduce-like systems for fast completion time [ C ]// Proceedings of the INFOCOM. 2011: 3074-3082.
  • 8Li Z,Shen Y,Yao B,et al. OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster[ J]. Inter- national Journal of Parallel Prograramlng, 2015, 43 ( 3 ) : 472-488.
  • 9万聪,王翠荣,王聪,贾朔.MapReduce模型中reduce阶段负载均衡分区算法研究[J].小型微型计算机系统,2015,36(2):240-243. 被引量:10
  • 10Lei C,Zhuang Z, Rundensteiner E A,et al. Shared exe- cution of recurring workloads in MapReduce [ C ]//Pro- ceedings of the VLDB Endowment. 2015:714-725.

二级参考文献10

  • 1Calvin Lin, Lawrence Snyder. Principles of parallel programming [ M ]. Beijing: China Machine Press, 2009 : 2 - 19.
  • 2Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters[ J]. Communications of the ACM ,2008,51 ( 1 ), 107- 113.
  • 3Apache Software Foundation. Hadoop mapReduce tutorial [ EB/ OL ]. http://hadoop, apache, org/mapreduce/,2012.
  • 4Ahmad F, Chakradhar S, et al. Tarazu: optimizing mapReduce on heterogeneous clusters [ J ]. Computer Architecture News, 2012,40 ( 1 ) :61-74.
  • 5Xicheng D, Ying W, Huaming L. Scheduling mixed real-time and non-real-time applications in MapReduce environment[ C ]. In Pro- ceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS 2011 ) ,2011:9-16.
  • 6Kc K,Anyanwu K. Scheduling hadoop jobs to meet deadlines[ C]. In Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom 2010 ), 2010:388-392.
  • 7Yaohui W, Tao Y. Study on re-implement mechanism based on node-ability in Hadoop[ C]. In Proceedings of the 2011 Internation- al Conference on Computer Science and Network Technology (ICCSNT) ,2011 : 1220-1223.
  • 8Weisong H, Chao T, Xiaowei L. Multiple-job optimization in ma- pReduce for heterogeneous workloads [ C ]. In Proceedings of the 2010 Sixth International Conference on Semantics Knowledge and Grid (SKG 2010) ,2010:135-140.
  • 9Polo J, Carrera D, Becerra Y. Performance-driven task co-scheduling for mapReduee environments[ C ]. In Proceedings of the 2010 IEEE/IFIP Network Operations and Management Symposium- NOMS 2010,2010:373-380.
  • 10林,斯奈德.并行程序设计原理[M].北京:机械工业出版社,2009:2-19.

共引文献13

同被引文献5

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部