期刊文献+

基于压力反馈的MapReduce负载均衡策略 被引量:4

Load Balancing Strategy Based on Pressure Feedback on MapReduce
下载PDF
导出
摘要 数据倾斜是严重影响MapReduce性能的因素之一。数据倾斜问题的现有解决方法需要用户对应用类型提供针对的分区函数,或是为MapReduce编写额外的采样过程,增加了用户的负担。为解决上述问题,提出了一种基于压力统计的负载均衡策略。该策略充分利用MapReduce中的混洗阶段,在reducer准备数据的同时进行统计,以获取全局数据分布。系统根据数据分布情况对负载较重节点进行调度,平衡整个集群负载,而无需用户提供额外的输入。此外,考虑到上层不同的应用类型,引入了压力反馈机制来进一步提高调度策略的性能。实验结果表明,提出的负载均衡调度策略的性能优于默认策略性能。 Data skew is one of the factors which seriously affects the performance of MapReduce.Existing solutions for the data skew problem increase the burden that the users need to provide the partition function for the specific application,or write additional sampling processes for the MapReduce.To solve this problem,we presented a load balancing strategy based on pressure statistics.To get the global data distribution,we computed the statistics while preparing data,which makes full use of the shuffle stage in MapReduce.To balance the entire cluster,the strategy schedules the heavy nodes according to the data distribution,without requiring the user to provide additional input.In addition,due to the complexity of the applications,we introduced the pressure feedback mechanism,and further improved the performance of the scheduling policy.The experimental results show that our strategy is far more efficient than the default strategy.
出处 《计算机科学》 CSCD 北大核心 2015年第4期141-146,共6页 Computer Science
基金 国家自然科学基金项目(61373015 61300052 41301407) 国家教育部高等学校博士学科点专项科研基金资助项目(20103218110017) 江苏高校优势学科建设工程项目(PAPD) 中央高校基本科研业务费专项项目(NP2013307)资助
关键词 MAPREDUCE 数据倾斜 负载均衡 压力反馈 MapReduce Data skew Load balance Pressure feedback
  • 相关文献

参考文献13

  • 1Dean J,Ghemawat S.MapReduce:simplified data processing onlarge clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 2http://hadoop.apache.org.
  • 3Dhawalia P,Kailasam S,Janakiram D.Chisel:A Resource Savvy Approach for Handling Skew in MapReduce Applications[C]∥2013 IEEE Sixth International Conference on Cloud Computing (CLOUD).IEEE,2013:652-660.
  • 4DeWitt D J,Naughton J F,Schneider D A,et al.Practical skew handling in parallel joins[C]∥Very Large Data Bases(VLDB).1992:27-40.
  • 5Poosala V,Ioannidis Y E.Estimation of query-result distribution and its application in parallel-join load balancing[C]∥VLDB.1996:448-459.
  • 6Shatdal A,Naughton J F.Adaptive parallel aggregation algo-rithms[J].ACM SIGMOD Record,ACM,1995,24(2):104-114.
  • 7Gates A F,Natkovich O,Chopra S,et al.Building a high-level dataflow system on top of Map-Reduce:the Pig experience[J].Proceedings of the VLDB Endowment,2009,2(2):1414-1425.
  • 8Kwon Y C,Balazinska M,Howe B,et al.Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]∥Proceedings of the 1st ACM symposium on Cloud computing.ACM,2010:75-86.
  • 9Ibrahim S,Jin H,Lu L,et al.Handling partitioning skew in MapReduce using LEEN[J].Peer-to-Peer Networking and Applications,2013,6(4):409-424.
  • 10傅杰,都志辉.一种周期性MapReduce作业的负载均衡策略[J].计算机科学,2013,40(3):38-40. 被引量:15

二级参考文献11

  • 1White T.Hadoop:The definitive guide[OL].http://books.google.com,2010.
  • 2Borthakur D.TheHadoop Distributed File System:Architecture and Design[OL].http://cloudcomputing.googlecode.com,2007.
  • 3Dean J,Ghemawat S.MapReduce:Simplified Data Processing on Large Clusters[C] //OSDI'04,Proceedings of the 6th Coference on Symposium Opearting Systems Design & Implementation.Sep.2004.
  • 4Lammel M R.Google's MapReduce programming model-Revisited[J].Data Programmability Team,2007,68(3):208-237.
  • 5Armbrust M,Fox A,Griffith R.Above the Clouds:A Berkeley View of Cloud Computing[M].ACM,2010.
  • 6Seo S,et al.HPMR:Prefetching and Pre-shuffling SharedMapReduce Computation Environment[C] //the Proceedings of 11th IEEEInternational Conference on Cluster Computing.Sep.2009.
  • 7Jiang D,Ooi B C,Shi L,et al.The Performance of MapReduce:An Indepth Study[C] //Int' l Conference on Very Large Data Bases (VLDB).2010.
  • 8Dittrich J,Jindal A.Schad Hadoop+ +:Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)[J].VLDB 2010/PVLDB,2010,34(1/2):515-529.
  • 9Liu Xu-hui,Han Ji-zhong.Implementing WebGIS on Hadoop:A case study of improving small file I/O performance on HDFS[C] //Cluster Computing and Workshops,2009.IEEE International Conference on.2009:1-8.
  • 10Lee K-H,Lee Y-J,Choi H,et al.Parallel data processing with MapReduce:a survey[J].ACM SIGMOD Record,2011,40 (4):11-20.

共引文献14

同被引文献26

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部