期刊文献+

An Improved Algorithm for Optimizing MapReduce Based on Locality and Overlapping 被引量:5

An Improved Algorithm for Optimizing MapReduce Based on Locality and Overlapping
原文传递
导出
摘要 MapReduce is currently the most popular programming model for big data processing, and Hadoop is a weU-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR. MapReduce is currently the most popular programming model for big data processing, and Hadoop is a weU-known MapReduce implementation platform. However, Hadoop jobs suffer from imbalanced workloads during the reduce phase and inefficiently utilize the available computing and network resources. In some cases, these problems lead to serious performance degradation in MapReduce jobs. To resolve these problems, in this paper, we propose two algorithms, the Locality-Based Balanced Schedule (LBBS) and Overlapping-Based Resource Utilization (OBRU), that optimize the Locality-Enhanced Load Balance (LELB) and the Map, Local reduce, Shuffle, and final Reduce (MLSR) phases. The LBBS collects partition information from input data during the map phase and generates balanced schedule plans for the reduce phase. OBRU is responsible for using computing and network resources efficiently by overlapping the local reduce, shuffle, and final reduce phases. Experimental results show that the LBBS and OBRU algorithms yield significant improvements in load balancing. When LBBS and OBRU are applied, job performance increases by 15% from that of models using LELB and MLSR.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2018年第6期744-753,共10页 清华大学学报(自然科学版(英文版)
基金 supported by the National Key R&D Program of China(Nos.2017YFB0202104 and 2017YFB0202003)
关键词 MAPREDUCE OVERLAPPING load balance data locality MapReduce overlapping load balance data locality
  • 相关文献

同被引文献43

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部