共享的MapReduce环境下批量作业的调度算法研究被引量：2

Batch-Job Scheduling in Shared MapReduce Environment

下载PDF

导出

摘要 MapReduce作为当前最热门的并行数据处理系统之一,已经被广泛应用在生产、研究等多个领域中.任务调度策略作为MapReduce的核心技术之一,直接关系到系统的性能.但是,在多用户(部门)共享的MapReduce环境下处理批量作业时,已有的调度算法不能够保证系统良好的吞吐能力.针对此问题,一种在共享的MapReduce环境下的吞吐量驱动的任务调度算法(简称TD调度算法)被提出.首先结合共享的MapReduce环境下批量作业调度的特点,给出了调度框架,并根据处理过程中作业的参数变化,将作业归为4种状态并给出状态间的转换规则,避免了系统中资源浪费并保证了资源分配的公平性;其次,总结了在处理批量作业时提高吞吐量的主要手段,进而提出了TD调度算法,有效地降低了网络开销并显著的提高了系统的吞吐能力.最后通过大量的实验对TD调度算法的性能进行了验证.实验结果表明,TD调度算法能够有效地提高在共享的MapReduce环境下处理批量作业时系统的吞吐能力,符合实际应用的需求. As one of the most popular parallel data processing systems,MapReduce has been widely used in the production,research and many other fields.And task scheduling strategy,as one of the core technologies of MapReduce,is directly related to the system performance.However,in the multi-user(department)shared MapReduce environment,existing scheduling algorithms cannot guarantee that the system has good throughput capacity when processing batch jobs.Therefore,in this paper,a novel scheduling technique,throughput-driven task scheduling algorithm(TD scheduler)is proposed.Firstly,based on the characteristics of batch-job scheduling in shared MapReduce environment,the scheduling framework is proposed;and then according to the change of job parameters,the jobs are classified into four states and the rules for transitions between the states are given,which can avoid the waste of system resources and ensure the fairness of resource allocation.Secondly,the means to improve the throughput when processing batch jobs are summarized,and then TD scheduling algorithm is proposed,which can effectively reduce the network overhead and significantly improve the system throughput.Finally,the performances of TD scheduler are verified through plenty of simulation experiments.The experimental results show that the TD scheduler can effectively improve the system throughput when processing batch jobs in shared MapReduce environment,and it could meet the requirements of practical applications.

作者王习特申德荣聂铁铮寇月于戈

机构地区东北大学信息科学与工程学院

出处《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期332-341,共10页 Journal of Computer Research and Development

基金国家"九七三"重点基础研究计划基金项目(2012CB316201) 国家自然科学基金面上项目(61033007 61003060) 中央高校基本科研专项资金重点课题(N100704001) 教育部博士点基金项目(20120042110028) 教育部-英特尔信息技术专项科研基金项目(MOE-INTEL-2012-06)

关键词共享环境 MAPREDUCE 批量作业任务调度吞吐量 shared environment MapReduce batch job task scheduling throughput

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献3

1王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752. 被引量：615
2王鹏,孟丹,詹剑锋,涂碧波.数据密集型计算编程模型研究进展[J].计算机研究与发展,2010,47(11):1993-2002. 被引量：39
3陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. 被引量：1310

二级参考文献101

1Sims K. IBM introduces ready-to-use cloud computing collaboration services get clients started with cloud computing. 2007. http://www-03.ibm.com/press/us/en/pressrelease/22613.wss
2Boss G, Malladi P, Quan D, Legregni L, Hall H. Cloud computing. IBM White Paper, 2007. http://download.boulder.ibm.com/ ibmdl/pub/software/dw/wes/hipods/Cloud_computing_wp_final_8Oct.pdf
3Zhang YX, Zhou YZ. 4VP+: A novel meta OS approach for streaming programs in ubiquitous computing. In: Proc. of IEEE the 21st Int'l Conf. on Advanced Information Networking and Applications (AINA 2007). Los Alamitos: IEEE Computer Society, 2007. 394-403.
4Zhang YX, Zhou YZ. Transparent Computing: A new paradigm for pervasive computing. In: Ma JH, Jin H, Yang LT, Tsai JJP, eds. Proc. of the 3rd Int'l Conf. on Ubiquitous Intelligence and Computing (UIC 2006). Berlin, Heidelberg: Springer-Verlag, 2006. 1-11.
5Barroso LA, Dean J, Holzle U. Web search for a planet: The Google cluster architecture. IEEE Micro, 2003,23(2):22-28.
6Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 1998,30(1-7): 107-117.
7Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the 19th ACM Symp. on Operating Systems Principles. New York: ACM Press, 2003.29-43.
8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the 6th Symp. on Operating System Design and Implementation. Berkeley: USENIX Association, 2004. 137-150.
9Burrows M. The chubby lock service for loosely-coupled distributed systems. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 335-350.
10Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: A distributed storage system for structured data. In: Proc. of the 7th USENIX Symp. on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2006. 205-218.

共引文献1950

1张刘玲.会展行业发展现状及未来发展趋势[J].质量与市场,2023(12):31-33. 被引量：2
2李明建.试论大数据技术的图书馆特色馆藏文化建设[J].作家天地,2020(21):189-190.
3查伟,孙燕琼,郑继平.基于云测试架构的FIVP解决方案[J].铁路技术创新,2021(S01):82-86.
4林少伟.人工智能法律主体资格实现路径:以商事主体为视角[J].中国政法大学学报,2021(3):165-177. 被引量：5
5胡祖林,肇杰.云计算下的网盘安全[J].计算机产品与流通,2020,0(1):164-164.
6陈然.大数据时代下企业精准营销发展难题及破解[J].中国经贸导刊,2019,0(5Z):95-96. 被引量：4
7梅傲.数据治理的逻辑基础和实现路径[J].经济法论丛,2023(2):309-325. 被引量：1
8张盛,任伟,王玉,黄金明,陈旭彤.基于Web的重力异常正演建模工具[J].地质论评,2023,69(S01):595-597.
9赵文韬.基于5G技术的黑龙江云计算产业发展[J].电子技术（上海）,2020,49(9):186-187.
10Longfei He,Mei Xue,Bin Gu.Internet-of-things enabled supply chain planning and coordination with big data services:Certain theoretic implications[J].Journal of Management Science and Engineering,2020,5(1):1-22. 被引量：5

同被引文献13

1李千目,张晟骁,陆路,戚湧,张宏.一种Hadoop平台下的调度算法及混合调度策略[J].计算机研究与发展,2013,50(S1):361-368. 被引量：12
2郭晓慧,李润知,张茜,王宗敏.基于Zabbix的分布式服务器监控应用研究[J].通信学报,2013,34(S2):94-98. 被引量：39
3于国防,王耀才,庄立运,贾栋清.集群服务器响应延时预测及其负载调度控制[J].计算机系统应用,2007,16(7):26-29. 被引量：3
4李丙锋,祝永志,魏榕晖.异构Beowulf系统负载均衡技术的研究与实现[J].计算机技术与发展,2008,18(7):60-62. 被引量：4
5潘巍,李战怀,伍赛,陈群.基于消息传递机制的MapReduce图算法研究[J].计算机学报,2011,34(10):1768-1784. 被引量：45
6董波,沈青,肖德宝.云计算集群服务器系统监控方法的研究[J].计算机工程与科学,2012,34(10):68-72. 被引量：31
7田国忠,肖创柏,徐竹胜,肖霞.异构分布式环境下多DAG工作流的混合调度策略[J].软件学报,2012,23(10):2720-2734. 被引量：13
8饶君,吴斌,东昱晓.MapReduce环境下的并行复杂网络链路预测[J].软件学报,2012,23(12):3175-3186. 被引量：13
9许丞,刘洪,谭良.Hadoop云平台的一种新的任务调度和监控机制[J].计算机科学,2013,40(1):112-117. 被引量：52
10和亮,冯登国,王蕊,苏璞睿,应凌云.基于MapReduce的大规模在线社交网络蠕虫仿真[J].软件学报,2013,24(7):1666-1682. 被引量：15

引证文献2

1韩艳,王静宇,谭跃生.奇偶直方图负载均衡超立方对等云MapReduce模型[J].计算机应用研究,2016,33(4):1075-1078.
2胡雅鹏,丁维龙,王桂玲.一种面向异构大数据计算框架的监控及调度服务[J].计算机科学,2018,45(6):67-71. 被引量：5

二级引证文献5

1杨立鹏,张仰森,张雯,王建,曾健荣.基于Storm实时流式计算框架的网络日志分析方法[J].计算机科学,2019,46(9):176-183. 被引量：5
2吴波,许道强,邹云峰,王甜甜,李鑫.阶段感知的跨域数据分析作业保障机制[J].计算机工程与应用,2019,55(23):78-85.
3李增本.基于向量编码和多级反向传播的异构大数据处理方法研究[J].山东农业大学学报（自然科学版）,2020,51(2):259-261. 被引量：2
4陈海倩.基于移动互联网大数据的异构实时计算架构分析[J].自动化技术与应用,2020,39(9):44-47. 被引量：2
5肖楠.基于DTS的多模态异构大数据检测方法研究[J].电子设计工程,2021,29(20):143-146. 被引量：2

1刘石,古城.浅谈Photoshop的批处理功能[J].照相机,2008(10):72-73. 被引量：3
2何宏烨.主机批量作业监控模型及实现[J].中国金融电脑,2016(4):59-68.
3田文洪,陈瑜,王心阳,薛瑞尼,赵勇.最小化多MapReduce任务总完工时间的分析模型及其应用[J].计算机工程与科学,2014,36(4):571-578.
4关晨至,刘振亚.文字快速摄录的程序设计[J].江西教育学院学报,2006,27(3):31-34.
5新品[J].数字生活,2007(11):156-165.
6马海明,蒋伟林,廖俊杰.广发银行集中监控平台:从面向资源到面向业务监控[J].中国金融电脑,2013(6):70-71. 被引量：2
7刘海蓉,朱永昌.浅谈邮件合并功能及应用[J].南昌高专学报,2010,25(3):163-164. 被引量：5
8鲍华,刘冠男.一种多DSP的并行数据处理系统设计及其实现[J].中国集成电路,2012,21(6):54-58.
9刘建兰,张卫华.Word中邮件合并的实用性探讨[J].南昌高专学报,2006,21(5):102-103. 被引量：3
10张龙军,沈钧毅,赵霖.基于多CPU的并行数据处理系统的研究[J].计算机工程与应用,2000,36(2):24-26.

计算机研究与发展

2013年第S1期

浏览历史

内容加载中请稍等...

共享的MapReduce环境下批量作业的调度算法研究被引量：2

参考文献3

二级参考文献101

共引文献1950

同被引文献13

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

共享的MapReduce环境下批量作业的调度算法研究 被引量：2

参考文献3

二级参考文献101

共引文献1950

同被引文献13

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

共享的MapReduce环境下批量作业的调度算法研究被引量：2