期刊文献+

MapReduce中连接负载均衡优化研究 被引量:4

Optimizing load balancing of joins in MapReduce
下载PDF
导出
摘要 数据分析和处理是大规模分布式数据处理应用中的重要任务。由于简单易用和具有灵活性,MapReduce编程模型逐渐成为大规模分布式数据处理系统(如Hadoop系统)的核心模型。由于所处理的数据可能不是均匀分布的,MapReduce编程模型在处理连接操作时,会出现数据倾斜问题。数据倾斜问题严重降低了MapReduce执行连接操作的效率。针对MapReduce中连接操作的数据倾斜问题,分析了造成MapReduce连接性能瓶颈的原因并建立负载均衡代价模型,提出了用范围分割方法控制连接过程中的数据倾斜问题实现负载均衡的策略。实验结果表明,所提方法明显提高了连接的效率。 Data analysis and processing is one of the most important tasks in large-scale distributed data processing applications. Due to its simplicity and scalability, MapReduce programming model has gradually become the crucial model for large-scale distributed data processing systems (eg. Hadoop). Since the data may be uniformly distributed, data skew occurs when MapReduce programming model joins data,thus degrading the join performance severely. To solve data skew, its reason is analyzed, the load balancing cost model is established, and the rangepartitioner algorithm is proposed to control data skew so as to realize load balancing. Experimental results demonstrate that our method can obviously im- prove the efficiency of joins.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第10期1860-1865,共6页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61070032)
关键词 MAPREDUCE 连接 数据倾斜 范围分割 负载均衡 MapReduce join data skew rangepartitioner load balancing
  • 相关文献

参考文献15

  • 1Dean J,Ghemawat S.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 2Blanas S,Patel J M,Ercegovac V,et al.A comparison of join algorithms for log processing in MapReduce[C]∥Proc of the 2010ACM SIGMOD International Conference on Management of Data,2010:975-986.
  • 3Afrati F N,Ullman J D.Optimizing multiway joins in a MapReduce environment[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(9):1282-1298.
  • 4Gufler B,Augsten N,Reiser A,et al.Load balancing in MapReduce based on scalable cardinality estimates[C]∥Proc of the International Conference on Data Engineering,2012:522-533.
  • 5Gufler B,Augsten N,Reiser A,et al.Handling data skew in MapReduce[C]∥Proc of the 1st International Conference on Cloud Computing and Services Science,2011:574-583.
  • 6Yang H,Dasdan A,Hsiao R L,et al.Map-reduce-merge:Simplified relational data processing on large clusters[C]∥Proc of the 2007ACM SIGMOD International Conference on Management of Data,2007:1029-1040.
  • 7Wang H,Qin X,Zhang Y,et al.LinearDB:A relational approach to make data warehouse scale like MapReduce[C]∥Proc of DASFAA’11,2011:306-320.
  • 8Dittrich J,Quiané-Ruiz J A,Jindal A,et al.Hadoop++:Making ayellow elephant run like a cheetah(without it even noticing)[J].Proceedings of the VLDB Endowment,2010,3(1-2):515-529.
  • 9Eltabakh M Y,Tian Y,zcan F,et al.CoHadoop:flexible data placement and its exploitation in Hadoop[J].Proceedings of the VLDB Endowment,2011,4(9):575-585.
  • 10Okcan A,Riedewald M.Processing theta-joins using MapReduce[C]∥Proc of the 2011ACM SIGMOD International Conference on Management of Data,2011:949-960.

同被引文献20

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部