期刊文献+

MapReduce在线抽样分区负载均衡研究 被引量:5

Research on MapReduce On-line Load Balancing Based on Sample Partition
下载PDF
导出
摘要 数据倾斜一直是影响MapReduce性能的关键问题之一.为缓解数据倾斜问题,提出一种基于抽样分区的MapReduce在线负载均衡机制:MR-LSP(MapReduce on-line Load balancing mechanism based on Sample Partition).MR-LSP在作业执行之前,通过对源数据抽样分析,预测数据的分布特征,动态采取相应的负载均衡数据分区策略;在作业运行期间实时监控节点负载,进一步动态优化数据分区策略.实验结果表明:MR-LSP能够提高系统3.2%的负载均衡,降低4.3%的作业执行时间,有效缓解了MapReduce的数据倾斜问题. Data skew has been one of the key issues that affect the performance of MapReduce. To address data skew, the paper proposes a MapReduce on-line load balancing mechanism based on sample partition:MR-LSP. Before job processing,MR-LSP predicts the distribution of data by sampling and analyzing the source data, and dynamically takes appropriate load balancing data partitioning strat- egy ; it can be timely fed back node load information during job running to further optimize the data partitioning strategy. Experimental results show that MR-LSP can improve the load balance of system by 3.2% and reduce job execution time by 4.3% ,and effectively alleviate the MapReduce data skew.
出处 《小型微型计算机系统》 CSCD 北大核心 2017年第2期238-242,共5页 Journal of Chinese Computer Systems
基金 河南省高等学校重点科研项目(16A520027)资助
关键词 -MapReduce 数据倾斜 动态调度 抽样分区 MapReduce data skew dynamic scheduling sampling partition
  • 相关文献

参考文献1

二级参考文献1

  • 1Shadi Ibrahim,Hai Jin,Lu Lu,Bingsheng He,Gabriel Antoniu,Song Wu.Handling partitioning skew in MapReduce using LEEN[J].Peer-to-Peer Networking and Applications.2013(4)

共引文献11

同被引文献44

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部