期刊文献+

共享的MapReduce环境下批量作业的调度算法研究 被引量:2

Batch-Job Scheduling in Shared MapReduce Environment
下载PDF
导出
摘要 MapReduce作为当前最热门的并行数据处理系统之一,已经被广泛应用在生产、研究等多个领域中.任务调度策略作为MapReduce的核心技术之一,直接关系到系统的性能.但是,在多用户(部门)共享的MapReduce环境下处理批量作业时,已有的调度算法不能够保证系统良好的吞吐能力.针对此问题,一种在共享的MapReduce环境下的吞吐量驱动的任务调度算法(简称TD调度算法)被提出.首先结合共享的MapReduce环境下批量作业调度的特点,给出了调度框架,并根据处理过程中作业的参数变化,将作业归为4种状态并给出状态间的转换规则,避免了系统中资源浪费并保证了资源分配的公平性;其次,总结了在处理批量作业时提高吞吐量的主要手段,进而提出了TD调度算法,有效地降低了网络开销并显著的提高了系统的吞吐能力.最后通过大量的实验对TD调度算法的性能进行了验证.实验结果表明,TD调度算法能够有效地提高在共享的MapReduce环境下处理批量作业时系统的吞吐能力,符合实际应用的需求. As one of the most popular parallel data processing systems,MapReduce has been widely used in the production,research and many other fields.And task scheduling strategy,as one of the core technologies of MapReduce,is directly related to the system performance.However,in the multi-user(department)shared MapReduce environment,existing scheduling algorithms cannot guarantee that the system has good throughput capacity when processing batch jobs.Therefore,in this paper,a novel scheduling technique,throughput-driven task scheduling algorithm(TD scheduler)is proposed.Firstly,based on the characteristics of batch-job scheduling in shared MapReduce environment,the scheduling framework is proposed;and then according to the change of job parameters,the jobs are classified into four states and the rules for transitions between the states are given,which can avoid the waste of system resources and ensure the fairness of resource allocation.Secondly,the means to improve the throughput when processing batch jobs are summarized,and then TD scheduling algorithm is proposed,which can effectively reduce the network overhead and significantly improve the system throughput.Finally,the performances of TD scheduler are verified through plenty of simulation experiments.The experimental results show that the TD scheduler can effectively improve the system throughput when processing batch jobs in shared MapReduce environment,and it could meet the requirements of practical applications.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期332-341,共10页 Journal of Computer Research and Development
基金 国家"九七三"重点基础研究计划基金项目(2012CB316201) 国家自然科学基金面上项目(61033007 61003060) 中央高校基本科研专项资金重点课题(N100704001) 教育部博士点基金项目(20120042110028) 教育部-英特尔信息技术专项科研基金项目(MOE-INTEL-2012-06)
关键词 共享环境 MAPREDUCE 批量作业 任务调度 吞吐量 shared environment MapReduce batch job task scheduling throughput
  • 相关文献

参考文献3

二级参考文献101

  • 1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
  • 2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
  • 3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
  • 4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
  • 5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
  • 6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
  • 7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
  • 8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
  • 9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
  • 10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1950

同被引文献13

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部