期刊文献+

基于垃圾回收的MapReduce作业内存调优 被引量:2

GC-based MapReduce Job Memory Tuning
下载PDF
导出
摘要 针对合理管理MapReduce作业内存资源困难的问题,提出评估方法并给出优化配置建议。首先分析Java虚拟机的内存分配与垃圾回收的原理,给出垃圾回收重要指标;其次提出内存分配合理性评估的3种指标和评估方法;最后根据评估结果给出2种优化配置建议:一是通过使用聚类算法和统计信息来估计晋升对象大小阈值,优化Java虚拟机的对象分配和垃圾回收性能;二是使用回归模型和搜索算法来预测作业合理的内存配置。实验结果表明,提出的方法能自动发现作业内存配置的不足并给出优化的配置建议。与采用机器学习方法相比,提出的方法不需要运行大量的测试,因此该方法能很好适用于MapReduce的生产集群环境。 Different Job requires different memory resources,and it is difficult to assess the rationality for a memory allocation to a MapReduce Job. In order to solve this problem,an assessment method was presented and recommended for memory settings of JVM where Job's tasks run. Firstly,some important GC metrics were introduced based on the analysis of JVM's memory allocation and GC workflow in-depth. Then,three kinds of indicators and memory allocation rationality evaluation method were introduced based on the three indicators. Finally,two kinds of optimal JVM configuration were recommended,which are using K-means algorithm and statistical information to estimate the threshold value of the object size which should have been allocate in old generation,and modeling GC pause time and using search algorithm to predict the size of young generation and the old generation,respectively. Experimental results showed that the proposed approach can automatically find insufficient of memory configuration of a Job. Compared with using machine learning methods,the proposed method does not need to run a large number of test cases,so it can apply to production cluster of MapReduce.
出处 《四川大学学报(工程科学版)》 EI CAS CSCD 北大核心 2015年第6期104-112,共9页 Journal of Sichuan University (Engineering Science Edition)
基金 国家科技支撑计划资助项目(2012BAH18B05)
关键词 MAPREDUCE HADOOP JAVA虚拟机 垃圾回收 资源优化 MapReduce Hadoop JVM GC Memory tuning
  • 相关文献

参考文献21

  • 1Dean J, Ghemawat S. MapReduce : Simplified data process- ing on large clusters [ J ]. Communications of the ACM, 2008,51 ( I ) : 107 - 113.
  • 2Zikopoulos P, Eaton C. Understanding big data: Analytics for enterprise class Hadoop and streaming data[EB/OL]. (2011 - 10) [2015 -06 - 10]. http://public, dhe. ibm. corn/ common/ssi/ecm/im/en/im114296usen/IML14296USEN. PDF.
  • 3Olston C,Reed B,Srivastava U,et al. Pig latin: A not-so- foreign language for data processing [ C ]//Proceedings of the 2008 ACM SIGMOD international conference on Man- agement of data. Vancouver, Canada: ACM, 2008 : 1099 - 1110.
  • 4Thusoo A,Sarma J S, Jain N, et al. Hive-a petabyte scale data warehouse using hadoop[C]//2010 IEEE 26th In- ternational Conference on Data Engineering (ICDE). Long Beach, California, USA : IEEE, 2010 : 996 - 1005.
  • 5Herodotou H, Babu S. Profiling, what-if analysis, and cost- based optimization of MapReduce programs [ J]. Proceed- ings of the VLDB Endowment, 2011,4 ( 11 ) : 1111 - 1122.
  • 6Yang H L,Luan Z Z,Li W J,et al. MapReduce workload modeling with statistical approach [ J]. Journal of Grid Computing,2012,10 (2) :279 - 310.
  • 7周世龙,陈兴蜀,罗永刚.基于灰盒模型的Hadoop MapReduce job参数性能分析与预测[J].四川大学学报(工程科学版),2014,46(S1):146-154. 被引量:6
  • 8Kadirvel S, Fortes J A B. Grey-box approach for perform- ance prediction in map-reduce based platforms [ C]//21st International Conference on Computer Communications and Networks (ICCCN), 2012. Washington DC, USA: IEEE Communication Society,2012.
  • 9Xu L,Liu J,Wei J. FMEM:A fine-grained memory esti- mator for MapReduce jobs [ C ]//ICAC, 10th International Conference on Autonomic Computing. San Jose,California: USA USENIX in Cooperation with ACM SIGARCH,2013: 65 - 68.
  • 10Singer J, Kovoor G, Brown G, et al. Garbage collection auto-tuning for java mapreduee on multi-cores [ J]. ACM SIGPLAN Notices,2011,46( 11 ) : 109 - 118.

二级参考文献12

  • 1Hailong Yang,Zhongzhi Luan,Wenjun Li,Depei Qian.MapReduce Workload Modeling with Statistical Approach[J].Journal of Grid Computing.2012(2)
  • 2Paul Barham,Boris Dragovic,Keir Fraser,Steven Hand,Tim Harris,Alex Ho,Rolf Neugebauer,Ian Pratt,Andrew Warfield.Xen and the art of virtualization[J].ACM SIGOPS Operating Systems Review.2003(5)
  • 3Rizvandi N B,Taheri J,Moraveji R,et al.On modelling and prediction of total CPU usage for applications in mapreduce environments[].Algorithms and Architectures for Parallel Processing.2012
  • 4Herodotou H,Dong F,Babu S.MapReduce programming and cost-based optimization? Crossingthis chasm with Starfish[].Proceedings of the VLDB Endowment.2011
  • 5Babu S.Towards automatic optimization of MapReduce programs[].Proceedings of thest ACM symposium on Cloud computing.2010
  • 6Intel.Optimizing Hadoop*deployments[]..2010
  • 7Impetus Technologies Inc.Hadoop performance tuning[]..2010
  • 8KA V S,TAN J,GANDHI J,et al.An Analysis of Traces from a Production MapReduce Cluster[].thIEEE/ACM International Conference on ClusterCloud and Grid Computing.2010
  • 9Oracle Corporation.A dynamic instrumentation tool for Java. http://kenai.com/projects/btrace . 2013
  • 10O’’Malley O.TeraByte sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf . 2008

共引文献5

同被引文献6

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部