摘要
分布式集群普遍存在负载均衡问题,而Hadoop没有考虑到节点间性能的差异.虽然有负载均衡机制,但是效果不太理想,因此运行过程中经常会出现负载不均衡的情况。针对如上问题,深入分析了Hadoop源代码,理清了Hadoop的运行原理,在Hadoop资源管理机制Yarn中改进了Hadoop任务的排序,建立了新的任务排序规则,提出了对各节点性能评价的指标,分为动态性能指标和静态性能指标。在此基础上对Yarn的Fair Scheduler算法进行了改进,形成了考虑节点性能的调度算法。重新对Hadoop源码进行了编译,在所搭建的Hadoop平台上进行了对比实验,证明了加入节点性能指标有效解决了Hadoop负载均衡问题,对Hadoop的运行效率有了很大提高。
Distributed cluster has the problem of load balancing,and the Hadoop does not take into account the differencesin the performance of the nodes.Although it has a load balancing mechanism,the effect is not ideal.As a result,there isoften a load imbalance in the process of running.In view of the above problem,this paper has in-depth analysis ofthe Hadoop source code,to clarify of hadoop principle,and improves Hadoop task scheduling in Yarn which is resourcemanagement mechanism of Hadoop.Then establishes new task scheduling rules,and also proposes a performance evaluationindex for each node,performance evaluation includes dynamic performance and static performance.On the basis ofthis,this paper improves FairScheduler algorithm of Yarn,and forms a scheduling algorithm considering the performanceof nades.To recompile the Hadoop source code,and comparative experiment which carries out on the Hadoop platform,and proves the performance index of the join node can effectively solve the problem of Hadoop load balancing,greatlyimproves of running efficiency on Hadoop.
作者
冯兴杰
贺阳
FENG Xingjie;HE Yang(School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China)
出处
《计算机工程与应用》
CSCD
北大核心
2017年第12期85-91,共7页
Computer Engineering and Applications
基金
国家自然科学基金委员会与中国民用航空局联合基金项目(No.U1233113)
国家自然科学基金(No.61301245
No.61201414)