摘要
在Hadoop框架下计算资源和数据资源可以在不同物理位置的特点产生本地化问题。延迟调度算法的产生旨在解决本地化问题,此算法根据任务待处理数据的物理位置作为作业的计算节点,调度任务至目标节点。但是可能出现同一作业中若干任务集中运行在某一计算节点,导致作业达不到理想的并行效果。针对原有的延迟调度算法,提出延迟一容量调度算法,允许部分任务选择非本地化节点作为原延迟调度算法中任务的目标计算节点,以提高作业的响应时间与增加作业的并行程度。最后通过实验对比分析,改进后的算法在执行效率和并行效果明显优于原延迟调度算法。
Locality problem is caused by the physical location inconsistency between computing resource and data resource in Hadoop. Delay scheduling algorithm to solve locality problem which taking the physical location of task data to be processed as computing nodes and migrating task to the target nodes. However, it may appear with a work tasks focus on running in one computing node, resulting non-ideal parallelling effect in operation. To solver this problem, this paper proposed delay-capacity scheduler algorithm on the basis of delay scheduler algorithm, which allowed some task run on a node that did not contain its input data, so that decrease the job response time and improve the degree of job parallelization. Finally, through experimental analysis, the improved algorithm in efficiency and parallelization effect is obviously superior to the original delay scheduling al-
出处
《计算机应用研究》
CSCD
北大核心
2013年第5期1397-1401,共5页
Application Research of Computers
关键词
本地化
延迟调度
延迟-容量调度
locality
delay scheduling
delay-capacity scheduling