期刊文献+

Reliable Estimation of Execution Time of MapReduce Program 被引量:1

关于MapReduce程序运行时间的可靠估算(英文)
下载PDF
导出
摘要 As data volume grows, many enterprises are considering using MapReduce for its simplicity. However, how to evaluate the performance improvement before deployment is still an issue. Current researches of MapReduce performance are mainly based on monitoring and simulation, and lack mathematical models. In this paper, we present a simple but powerful performance model for the prediction of the execution time of a MapReduce program with limited resources. We study each component of MapReduce framework, and analyze the relation between the overall performance and the number of mappers and reducers based on our model. Two typical MapReduce programs are evaluated in a small cluster with 13 nodes. Experimental results show that the mathematical performance model can estimate the execution time of MapReduce programs reliably. According to our model, number of mappers and reducers can be tuned to form a better execution pipeline and lead to better performance. The model also points out potential bottlenecks of the framework and future improvement. As data volume grows, many enterprises are considering using MapReduce for its simplicity. However, how to evaluate the performance improvement before deployment is still an issue. Current researches of MapReduce performance are mainly based on monitoring and simulation, and lack mathematical models. In this paper, we present a simple but powerful performance model for the prediction of the execution time of a MapReduce program with limited resources. We study each component of MapReduce framework, and analyze the relation between the overall performance and the number of mappers and reducers based on our model. Two typical MapReduce programs are evaluated in a small cluster with 13 nodes. Experimental results show that the mathematical performance model can estimate the execution time of MapReduce programs reliably. According to our model, number of mappers and reducers can be tuned to form a better execution pipeline and lead to better performance. The model also points out potential bottlenecks of the framework and future improvement.
作者 杨肖 孙建伶
出处 《China Communications》 SCIE CSCD 2011年第6期11-18,共8页 中国通信(英文版)
基金 supported by CHB Project "Unstructured Data Management System" under Grant No.2010ZX01042-002-003
关键词 performance model MAPREDUCE execution time performance model MapReduce execution time
  • 相关文献

参考文献13

  • 1HADOOP. http://hadoop.apache.org/ .
  • 2RANGER C,RAGHURAMAN R,PENMETSA A, et al.E- valuating Mapreduce for Multi-Core and Multiprocessor Sys- tems. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture: Fe- burary 10-14, 2007 . 2007
  • 3TAN JIAQI,PAN XINGHAO,KAVULYA S, et al.Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. Proceedings of the 2009 Conference on Hot Topics in Cloud Computing: June 14-19, 2009 . 2009
  • 4KAVULYA S,TAN J,GANDHI J, etal.An Analysis of Traces from a Production MapReduce Cluster. Proceed- ings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing: May 17-20, 2010 . 2010
  • 5BOULON J,KONWINSKI A,Qi R, et al.Chukwa, a large- scale monitoring system. Proceedings of Conference in Cloud Computing and its Applications (CCA ’’’’’’’’’’’’’’’’08) 2008 . 2008
  • 6WANG G,BUTT A,PANDEY P, et al.A Simulation Ap- proach to Evaluating Design Decisions in MapReduce Setups. International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems,: September 21-23, 2009 . 2009
  • 7ZAHARIA M,KONWINSKI A,JOSEPH A, et al.Improving MapReduce performance in heterogeneous environments. Proceedings of the 8th USENIX conference on Operating sys- tems design and implementation: December 8-10, 2008 . 2008
  • 8CONDIE T,CONWAY N,ALVARO P, et al.MapRedu- ceOnline. Technical Report No.UCB/EECS-2009-136 .
  • 9JIANG D,OOI B,SHI L, et al.The Performance of Mapreduce: an In-Depth Study. PVLDB . 2010
  • 10Nykiel T,Potamias M,Mishra C, et al.Mrshare: Sharing across Multiple Queries in Mapreduce. PVLDB . 2010

同被引文献1

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部