Reliable Estimation of Execution Time of MapReduce Program 被引量：1

关于MapReduce程序运行时间的可靠估算(英文)

下载PDF

导出

摘要 As data volume grows, many enterprises are considering using MapReduce for its simplicity. However, how to evaluate the performance improvement before deployment is still an issue. Current researches of MapReduce performance are mainly based on monitoring and simulation, and lack mathematical models. In this paper, we present a simple but powerful performance model for the prediction of the execution time of a MapReduce program with limited resources. We study each component of MapReduce framework, and analyze the relation between the overall performance and the number of mappers and reducers based on our model. Two typical MapReduce programs are evaluated in a small cluster with 13 nodes. Experimental results show that the mathematical performance model can estimate the execution time of MapReduce programs reliably. According to our model, number of mappers and reducers can be tuned to form a better execution pipeline and lead to better performance. The model also points out potential bottlenecks of the framework and future improvement. As data volume grows, many enterprises are considering using MapReduce for its simplicity. However, how to evaluate the performance improvement before deployment is still an issue. Current researches of MapReduce performance are mainly based on monitoring and simulation, and lack mathematical models. In this paper, we present a simple but powerful performance model for the prediction of the execution time of a MapReduce program with limited resources. We study each component of MapReduce framework, and analyze the relation between the overall performance and the number of mappers and reducers based on our model. Two typical MapReduce programs are evaluated in a small cluster with 13 nodes. Experimental results show that the mathematical performance model can estimate the execution time of MapReduce programs reliably. According to our model, number of mappers and reducers can be tuned to form a better execution pipeline and lead to better performance. The model also points out potential bottlenecks of the framework and future improvement.

作者杨肖孙建伶

机构地区 Department of Computer Science and Technology

出处《China Communications》 SCIE CSCD 2011年第6期11-18,共8页 中国通信（英文版）

基金 supported by CHB Project "Unstructured Data Management System" under Grant No.2010ZX01042-002-003

关键词 performance model MAPREDUCE execution time performance model MapReduce execution time

分类号 TP274 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献13

1HADOOP. http://hadoop.apache.org/ .
2RANGER C,RAGHURAMAN R,PENMETSA A, et al.E- valuating Mapreduce for Multi-Core and Multiprocessor Sys- tems. Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture: Fe- burary 10-14, 2007 . 2007
3TAN JIAQI,PAN XINGHAO,KAVULYA S, et al.Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop. Proceedings of the 2009 Conference on Hot Topics in Cloud Computing: June 14-19, 2009 . 2009
4KAVULYA S,TAN J,GANDHI J, etal.An Analysis of Traces from a Production MapReduce Cluster. Proceed- ings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing: May 17-20, 2010 . 2010
5BOULON J,KONWINSKI A,Qi R, et al.Chukwa, a large- scale monitoring system. Proceedings of Conference in Cloud Computing and its Applications (CCA ’’’’’’’’’’’’’’’’08) 2008 . 2008
6WANG G,BUTT A,PANDEY P, et al.A Simulation Ap- proach to Evaluating Design Decisions in MapReduce Setups. International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems,: September 21-23, 2009 . 2009
7ZAHARIA M,KONWINSKI A,JOSEPH A, et al.Improving MapReduce performance in heterogeneous environments. Proceedings of the 8th USENIX conference on Operating sys- tems design and implementation: December 8-10, 2008 . 2008
8CONDIE T,CONWAY N,ALVARO P, et al.MapRedu- ceOnline. Technical Report No.UCB/EECS-2009-136 .
9JIANG D,OOI B,SHI L, et al.The Performance of Mapreduce: an In-Depth Study. PVLDB . 2010
10Nykiel T,Potamias M,Mishra C, et al.Mrshare: Sharing across Multiple Queries in Mapreduce. PVLDB . 2010

同被引文献1

1徐振朋,门朝光,李香.日志检查点回卷恢复策略的检查点周期求解模型[J].高技术通讯,2011,21(6):575-580. 被引量：2

引证文献1

1何利,赵志元.基于Markov链的云平台故障模型与分析[J].重庆邮电大学学报（自然科学版）,2013,25(5):671-674. 被引量：1

二级引证文献1

1赵志元,张瑞祥.基于元胞自动机的可修复网络系统可靠性评估模型[J].重庆邮电大学学报（自然科学版）,2014,26(5):694-699. 被引量：3

1CHINESE GUIDE OF COMMUNICATIONS ENTERPRISES[J].中国无线通信,1996,2(6):45-45.
2Ying Xiangyue,Xu Tiefeng,Liu Taijun,Nie Qiuhua.PERFORMANCE IMPROVEMENT FOR A WCDMA RADIO OVER FIBER SYSTEM USING DIGITAL PRE-DISTORTER[J].Journal of Electronics(China),2012,29(1):27-32. 被引量：1
3胡坤,林心龙,黎尧,纪荣祎,周维虎.基于定点DSP的CORDIC算法研究[J].微电子学与计算机,2015,32(10):58-62. 被引量：2
4Yanmei Zhang.Research on Network Architecture and Security in Small and Medium Sized Enterprises[J].International Journal of Technology Management,2014(8):98-100.
5李桂菊,赵建,王金库.目标检测算法在DSP系统中软硬件优化方法的实现[J].红外与激光工程,2006,35(z4):377-382. 被引量：1
6Xinhua.Chlna＇s＂sock capital＂ grows on clustering[J].China's Foreign Trade,2010(1):16-16.
7曾祥永.高斯随机测量矩阵的研究[J].硅谷,2014,7(16):30-31. 被引量：1
8梁林,王裕,刘晶亮.Ambit BuildGates在高速ASIC设计中的STA应用[J].中国集成电路,2002,11(10):50-57.
9陈歆炜,赵建中,吴文.获取目标最佳极化算法的FPGA实现[J].电子技术应用,2012,38(6):82-84. 被引量：1
10史维更.Reconnectable Network with Limited Resources[J].Journal of Computer Science & Technology,1991,6(3):243-249.

China Communications

2011年第6期

浏览历史

内容加载中请稍等...

Reliable Estimation of Execution Time of MapReduce Program 被引量：1

参考文献13

同被引文献1

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史