期刊文献+

A Hadoop Performance Prediction Model Based on Random Forest

A Hadoop Performance Prediction Model Based on Random Forest
下载PDF
导出
摘要 MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems. MapReduce is a programming model for processing large data sets, and Hadoop is the most popular open-source implementation of MapReduce. To achieve high performance, up to 190 Hadoop configuration parameters must be manually tunned. This is not only time-consuming but also error-pron. In this paper, we propose a new performance model based on random forest, a recently devel- oped machine-learning algorithm. The model, called RFMS, is used to predict the performance of a Hadoop system according to the system' s configuration parameters. RFMS is created from 2000 distinct fine-grained performance observations with different Hadoop configurations. We test RFMS against the measured performance of representative workloads from the Hadoop Micro-benchmark suite. The results show that the prediction accuracy of RFMS achieves 95% on average and up to 99%. This new, highly accurate prediction model can be used to automatically optimize the performance of Hadoop systems.
出处 《ZTE Communications》 2013年第2期38-44,共7页 中兴通讯技术(英文版)
基金 supported by the cooperation project of Research on Green Cloud IDC Resource Scheduling with ZTE Corporation
关键词 big data cloud computing MAPREDUCE HADOOP random forest micro-benchmark big data cloud computing MapReduce Hadoop random forest micro-benchmark
  • 相关文献

参考文献23

  • 1E. Oren, R. Delbru, M. Catasta, R. Cyganiak+ H. Stenzhorn, and G. Tummarello. "Sindice.com: a document oriented lookup index for open linked data," Int. Metadata..Semantics, and Ontologies, vol. 3, no. 1., pp. 37-52, Nov. 2008.
  • 2Mahout- Apache Software bbundation project home page [Online]. Available: http://lucene.apache.org/mahout.
  • 3K. S. Beyer, V. Ercegovac, R. Genmlla, A. Balmin, M. Y. Ehabakh, C. C. Kanne, F. zcan, and E. J. Shekita, "Jaql: a scripting language for large scale semistruc- tured data analysis," PVLDB, vol. 4, no. 12, p. 1272-1283, 2011.
  • 4Mao-Ping Wen, Hsio-Yi Lin, An-Pin Chen, and Chyan Yang, "An integrated home financial investment learning envirtmment applying cloud computing in so- cial network analysis," in Int. Con ASONAM, Kaohsiung, 2011, pp. 751-754.
  • 5B. 1.mlgmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg, Searching for SNPs with cloud computing, Genome Biolo., R134, Oct.2009.
  • 6P. Lama and Xiaobo Zhou, AROMA: automated resource allocation and configu- ration of mapReduce environment in the cloud," in Proc. 9tb Int. ConF. Autonom- ic Computing (1CAC), San Jose, 2012, pp. 63-72.
  • 7H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu, Starfish: a self-tuning system for big data analytics, in Proc. Con lnnovalive Data Systems Research, Asilomar, CA, Jan. 9-12, 2011, pp. 261-272.
  • 8K. Kambatla, A. Pathak, and H. Pucha, Towards optimizing hadoop provision- ing in the cloud," in Proc. HotCloud'09, San Diego, CA.
  • 9Jinquan Dai, Jie Huang, Shengsheng Huang, Bo Huang, and Yah Liu, "HiTune: datatlow-based performance analysis for big data cloud," in Prec. USEN1XA TC" 11, Shanghai, 2011, PP. 7-7.
  • 10H. Herodotou and S. Balm, "Profiling, what-if analysis, and cost-based optimi- zation of MapReduce programs," in Prt'. VLDB, Seattle, WA, Aug. 29-Sep. 3, 2011, vol. 4, no. 11, pp. 1111-1122.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部