摘要
Hadoop平台中的MapReduce并行分布式编程模型通过将廉价节点组合成集群提供存储和计算服务,可以降低集群成本。Hadoop可以通过配置使Reduce任务在Map任务完成固定百分比时启动,但是过早地启动Reduce任务会造成Reduce资源长期处于等待状态。提出一种Reduce动态调度的DRS算法,通过作业中Map任务数量和大小计算Reduce启动时间,并在作业运行中根据Map任务的调度情况修正启动时间,以节约Reduce资源的使用效率。实验表明,DRS算法与固定百分比参数的方法相比,shuffle阶段时间缩短了7.3%。与系统默认参数相比shuffle阶段时间缩短了43.6%。
By combing the cheap nodes into cluster,MapReduce Parallel distributed programming model in the Hadoop platform provides the storage and computing service,which can reduce the cluster cost dramatically. Hadoop can launch Reduce tasks when the Map task is accomplished at a fixed percentage. However the Reduce resource will always be held in suspense due to premature launch of Reduce tasks. This paper proposes the DRS algorithm on account of Reduce launch time.Moreover,it modifies the launch time in accordance with scheduling status of Map tasks in the job’s operation so as to improve the efficiency of Reduce resources. Experiments demonstrate DRS algorithm shortens the launch time by 7. 3%,when compared with fixed-percentage parameter,and by43. 6% when compared with system default parameter.
出处
《信息工程大学学报》
2016年第1期83-87,96,共6页
Journal of Information Engineering University
基金
国家863计划资助项目(2012AA010905)
国家自然科学基金资助项目(61370081)