摘要
MapReduce作为当前最热门的并行数据处理系统之一,已经被广泛应用在生产、研究等多个领域中.任务调度策略作为MapReduce的核心技术之一,直接关系到系统的性能.但是,在多用户(部门)共享的MapReduce环境下处理批量作业时,已有的调度算法不能够保证系统良好的吞吐能力.针对此问题,一种在共享的MapReduce环境下的吞吐量驱动的任务调度算法(简称TD调度算法)被提出.首先结合共享的MapReduce环境下批量作业调度的特点,给出了调度框架,并根据处理过程中作业的参数变化,将作业归为4种状态并给出状态间的转换规则,避免了系统中资源浪费并保证了资源分配的公平性;其次,总结了在处理批量作业时提高吞吐量的主要手段,进而提出了TD调度算法,有效地降低了网络开销并显著的提高了系统的吞吐能力.最后通过大量的实验对TD调度算法的性能进行了验证.实验结果表明,TD调度算法能够有效地提高在共享的MapReduce环境下处理批量作业时系统的吞吐能力,符合实际应用的需求.
As one of the most popular parallel data processing systems,MapReduce has been widely used in the production,research and many other fields.And task scheduling strategy,as one of the core technologies of MapReduce,is directly related to the system performance.However,in the multi-user(department)shared MapReduce environment,existing scheduling algorithms cannot guarantee that the system has good throughput capacity when processing batch jobs.Therefore,in this paper,a novel scheduling technique,throughput-driven task scheduling algorithm(TD scheduler)is proposed.Firstly,based on the characteristics of batch-job scheduling in shared MapReduce environment,the scheduling framework is proposed;and then according to the change of job parameters,the jobs are classified into four states and the rules for transitions between the states are given,which can avoid the waste of system resources and ensure the fairness of resource allocation.Secondly,the means to improve the throughput when processing batch jobs are summarized,and then TD scheduling algorithm is proposed,which can effectively reduce the network overhead and significantly improve the system throughput.Finally,the performances of TD scheduler are verified through plenty of simulation experiments.The experimental results show that the TD scheduler can effectively improve the system throughput when processing batch jobs in shared MapReduce environment,and it could meet the requirements of practical applications.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第S1期332-341,共10页
Journal of Computer Research and Development
基金
国家"九七三"重点基础研究计划基金项目(2012CB316201)
国家自然科学基金面上项目(61033007
61003060)
中央高校基本科研专项资金重点课题(N100704001)
教育部博士点基金项目(20120042110028)
教育部-英特尔信息技术专项科研基金项目(MOE-INTEL-2012-06)