期刊文献+

MapReduce模型中reduce阶段负载均衡分区算法研究 被引量:10

Research on Reduce Stage Load Balanced Partition Algorithm Based on MapReduce
下载PDF
导出
摘要 MapReduce是一种处理大规模数据的并行计算模型,针对传统模型中reduce阶段各个结点负载不均衡的问题,提出一种reduce阶段负载均衡分区算法.算法将map阶段产生的中间数据划分为更多的分区,减少了每个分区的工作量,每次给reducetask分配一个分区,reducetask完成一个分区的工作之后会继续获得新的分区,直到所有的分区都被分配完毕,实现了动态调节reducetask的负载.还改进了MapReduce的通信协议来支持算法并且设计了新的容错机制.最后,通过重写Hadoop平台内核实现了算法并进行了实验分析,结果表明,该算法在不影响MapReduce模型的情况下显著的缩短了任务的处理时间. This paper proposes a reduce stage load balanced partition algorithm to solve the problem of load imbalance of reduce phase of MapReduce framework. The algorithm divides the data generated by map phase into more partitions so as to reduce the workload of each partition. Each reducetask is assigned one partition, and it would be assigned a new one after finishing one partition until all par- tions have been assigned. This paper also improves the MapReduce communication protocols to support the algorithm and designed a new fault-tolerant mechanisms. Finally, we implement the algorithm by recompiling the core of hadoop, the experimental results indica- ted the validity of the proposed scheme.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第2期240-243,共4页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61300195)资助 中央高校基本科研业务费项目(N110323009)资助 辽宁省教育厅科学研究一般项目(L2013099)资助
关键词 MAPREDUCE 分区算法 负载均衡 HADOOP MapReduce load balance partition algorithm Hadoop
  • 相关文献

参考文献10

  • 1Calvin Lin, Lawrence Snyder. Principles of parallel programming [ M ]. Beijing: China Machine Press, 2009 : 2 - 19.
  • 2Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters[ J]. Communications of the ACM ,2008,51 ( 1 ), 107- 113.
  • 3Apache Software Foundation. Hadoop mapReduce tutorial [ EB/ OL ]. http://hadoop, apache, org/mapreduce/,2012.
  • 4Ahmad F, Chakradhar S, et al. Tarazu: optimizing mapReduce on heterogeneous clusters [ J ]. Computer Architecture News, 2012,40 ( 1 ) :61-74.
  • 5Xicheng D, Ying W, Huaming L. Scheduling mixed real-time and non-real-time applications in MapReduce environment[ C ]. In Pro- ceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS 2011 ) ,2011:9-16.
  • 6Kc K,Anyanwu K. Scheduling hadoop jobs to meet deadlines[ C]. In Proceedings of the 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom 2010 ), 2010:388-392.
  • 7Yaohui W, Tao Y. Study on re-implement mechanism based on node-ability in Hadoop[ C]. In Proceedings of the 2011 Internation- al Conference on Computer Science and Network Technology (ICCSNT) ,2011 : 1220-1223.
  • 8Weisong H, Chao T, Xiaowei L. Multiple-job optimization in ma- pReduce for heterogeneous workloads [ C ]. In Proceedings of the 2010 Sixth International Conference on Semantics Knowledge and Grid (SKG 2010) ,2010:135-140.
  • 9Polo J, Carrera D, Becerra Y. Performance-driven task co-scheduling for mapReduee environments[ C ]. In Proceedings of the 2010 IEEE/IFIP Network Operations and Management Symposium- NOMS 2010,2010:373-380.
  • 10林,斯奈德.并行程序设计原理[M].北京:机械工业出版社,2009:2-19.

同被引文献82

  • 1周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 2韩蕾,孙徐湛,吴志川,陈立军.MapReduce上基于抽样的数据划分最优化研究[J].计算机研究与发展,2013,50(S2):77-84. 被引量:12
  • 3董西成.Hadoop技术内幕[M].北京:机械工业出版社,2013.
  • 4Dean J,Ghemawat S.Mapreduce:simplified data processing on large clusters.Communications of the ACM,2008,51(1):107-113.
  • 5Dean J, Ghemawat S. MapReduee: simplified data pro- cessing on large clusters [ J]. Communications of the ACM ,2008 ,51 (1) :107-113.
  • 6Tan J, Meng X, Zhang L. Performance analysis of coupling scheduler for mapreduce/hadoop [ C ]//Proceedings of the INFOCOM. 2012 : 2586-2590.
  • 7Chen F, Kodialam M, Lakshman T. Joint scheduling of processing and shuffle phases in mapreduce systems[J]. Proceedings of the INFOCOM ,2012,131 (5) :43-51.
  • 8Tan J,Meng S, MENG X, et al. Improving ReduceTask data locality for sequential MapReduce jobs [ J ]. Proceed- ings of the INFOCOM ,2013,12( 11 ) :27-35.
  • 9Wang W,Zhu K,Ying L,et al. Map task scheduling in mapreduce with data locality:Throughput and heavy-traf- fic optimality[ C ]//proceedings of the INFOCOM. 2013 : 351-372.
  • 10Tan J,Meng X,Zhang L. Coupling task progress for ma- preduce resource-aware scheduling [ C ]//Proceedings of the INFOCOM. 2013: 1618-1626.

引证文献10

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部