期刊文献+

Improving MapReduce Performance by Balancing Skewed Loads 被引量:4

Improving MapReduce Performance by Balancing Skewed Loads
下载PDF
导出
摘要 MapReduce has emerged as a popular computing model used in datacenters to process large amount of datasets.In the map phase,hash partitioning is employed to distribute data that sharing the same key across data center-scale cluster nodes.However,we observe that this approach can lead to uneven data distribution,which can result in skewed loads among reduce tasks,thus hamper performance of MapReduce systems.Moreover,worker nodes in MapReduce systems may differ in computing capability due to(1) multiple generations of hardware in non-virtualized data centers,or(2) co-location of virtual machines in virtualized data centers.The heterogeneity among cluster nodes exacerbates the negative effects of uneven data distribution.To improve MapReduce performance in heterogeneous clusters,we propose a novel load balancing approach in the reduce phase.This approach consists of two components:(1) performance prediction for reducers that run on heterogeneous nodes based on support vector machines models,and(2) heterogeneity-aware partitioning(HAP),which balances skewed data for reduce tasks.We implement this approach as a plug-in in current MapReduce system.Experimental results demonstrate that our proposed approach distributes work evenly among reduce tasks,and improves MapReduce performance with little overhead. MapReduce has emerged as a popular computing model used in datacenters to process large amount of datasets.In the map phase,hash partitioning is employed to distribute data that sharing the same key across data center-scale cluster nodes.However,we observe that this approach can lead to uneven data distribution,which can result in skewed loads among reduce tasks,thus hamper performance of MapReduce systems.Moreover,worker nodes in MapReduce systems may differ in computing capability due to(1) multiple generations of hardware in non-virtualized data centers,or(2) co-location of virtual machines in virtualized data centers.The heterogeneity among cluster nodes exacerbates the negative effects of uneven data distribution.To improve MapReduce performance in heterogeneous clusters,we propose a novel load balancing approach in the reduce phase.This approach consists of two components:(1) performance prediction for reducers that run on heterogeneous nodes based on support vector machines models,and(2) heterogeneity-aware partitioning(HAP),which balances skewed data for reduce tasks.We implement this approach as a plug-in in current MapReduce system.Experimental results demonstrate that our proposed approach distributes work evenly among reduce tasks,and improves MapReduce performance with little overhead.
出处 《China Communications》 SCIE CSCD 2014年第8期85-108,共24页 中国通信(英文版)
基金 The authors would like to thank the reviewers for their detailed reviews and constructive comments, which have helped improve the quality of this paper. This work is support- ed by National High-Tech Research and Development Plan of China under grants NO.2011AA01A204, and 2012AA01A306, National Natural Science Foundation of China under grant NO. 61202041, and NO.91330117.
关键词 MAPREDUCE cloud computing skewed loads performance prediction supportvector machines 性能预测 歪斜 负载 平衡 虚拟数据中心 reduce ce系统 数据分布
  • 相关文献

同被引文献21

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部