期刊文献+

改进的Hadoop作业调度算法 被引量:5

Improvement of job scheduling algorithm on Hadoop
下载PDF
导出
摘要 分布式集群普遍存在负载均衡问题,而Hadoop没有考虑到节点间性能的差异.虽然有负载均衡机制,但是效果不太理想,因此运行过程中经常会出现负载不均衡的情况。针对如上问题,深入分析了Hadoop源代码,理清了Hadoop的运行原理,在Hadoop资源管理机制Yarn中改进了Hadoop任务的排序,建立了新的任务排序规则,提出了对各节点性能评价的指标,分为动态性能指标和静态性能指标。在此基础上对Yarn的Fair Scheduler算法进行了改进,形成了考虑节点性能的调度算法。重新对Hadoop源码进行了编译,在所搭建的Hadoop平台上进行了对比实验,证明了加入节点性能指标有效解决了Hadoop负载均衡问题,对Hadoop的运行效率有了很大提高。 Distributed cluster has the problem of load balancing,and the Hadoop does not take into account the differencesin the performance of the nodes.Although it has a load balancing mechanism,the effect is not ideal.As a result,there isoften a load imbalance in the process of running.In view of the above problem,this paper has in-depth analysis ofthe Hadoop source code,to clarify of hadoop principle,and improves Hadoop task scheduling in Yarn which is resourcemanagement mechanism of Hadoop.Then establishes new task scheduling rules,and also proposes a performance evaluationindex for each node,performance evaluation includes dynamic performance and static performance.On the basis ofthis,this paper improves FairScheduler algorithm of Yarn,and forms a scheduling algorithm considering the performanceof nades.To recompile the Hadoop source code,and comparative experiment which carries out on the Hadoop platform,and proves the performance index of the join node can effectively solve the problem of Hadoop load balancing,greatlyimproves of running efficiency on Hadoop.
作者 冯兴杰 贺阳 FENG Xingjie;HE Yang(School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China)
出处 《计算机工程与应用》 CSCD 北大核心 2017年第12期85-91,共7页 Computer Engineering and Applications
基金 国家自然科学基金委员会与中国民用航空局联合基金项目(No.U1233113) 国家自然科学基金(No.61301245 No.61201414)
关键词 大数据 HADOOP YARN 负载均衡 FairScheduler 算法 big data Hadoop Yarn load balancing FairScheduler algorithm
  • 相关文献

参考文献4

二级参考文献49

  • 1张桂刚,李超,张勇,邢春晓.一种基于海量信息处理的云存储模型研究[J].计算机研究与发展,2012,49(S1):32-36. 被引量:23
  • 2韩蕾,孙徐湛,吴志川,陈立军.MapReduce上基于抽样的数据划分最优化研究[J].计算机研究与发展,2013,50(S2):77-84. 被引量:13
  • 3Phan L, Zhang Zhuo-yao, Zheng Qi, et al. An Empirical Analysis of Scheduling Techniques for Real-time Cloud-based Data Pro-cessing [C] // Proceedings of 2011 IEEE International Confe- rence on Service-Oriented Computing and Applicatiort 2011 : 1-8.
  • 4Fischer M J, Su Xue-yuan, Yin Yi-tong. Assigning tasks for effi- ciency in hadoop: extended abstract [C] // Proceedings of the 22nd ACM symposium on Parallelism in algorithms and archi- tectures. 2010: 30-39.
  • 5Jin Jia-hui, Luo Jun-zhou, Song Ai-bo, et al. Bar: an efficient data locality driven task scheduling algorithm for cloud computing [C]//Proceedings of the 11 th IEEE/ACM International Sympo- sium on Cluster, Cloud and Grid Computing (CCGrid). 2011 : 295-304.
  • 6Seo S, Jang I, Woo K, et al. HPMR: Prefetching and Pre-Shuff- ling in Shared MapReduce Computation Environment [C]//Pro- ceedings of IEEE International Conference on Cluster Compu- ting and Workshops(CLUSTER). 2009:1-8.
  • 7Hammoud M, Sakr M F. Locality-Aware Reduce Task Schedu- ling for MapReduce [C]//Proceedings of IEEE Third Interna- tional Conference on Cloud Computing Technology and Science(CloudCom). 2011 : 570-576.
  • 8Zaharia M, Borthakur D, Sarma J S, et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling[C]//Proceedings of the 5th European Conference on Computer Systems. 2010:265-278.
  • 9Zaharia M, Konwinski A, Joseph A D, et al. Improving mapre- duce performance in heterogeneous environments [C]//Pro- ceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation(OSDI). 2008:29-42.
  • 10Chen Qnan, Zhang Da-qiang Guo Min-yi, et at. SAMR: A Self- adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment[C]///Proceedings of IEEE 10th International Conference on Computer and Information Teehnology(CIT). 2010 : 2736-2743.

共引文献43

同被引文献46

引证文献5

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部