期刊文献+

Optimization Analysis of Hadoop

下载PDF
导出
摘要 Hadoop is a distributed data processing platform supporting MapReduce parallel computing framework. In order to deal with general problems, there is always a need of accelerating Hadoop under certain circumstance such as Hive jobs. By outputting current time to logs at specially selected points, we traced the workflow of a typical MapReduce job generated by Hive and making time statistics for every phase of the job. Using different data quantities, we compared the proportion of each phase and located the bottleneck points of Hadoop. We make two major optimization advices: (1) focus on using combine and optimizing Net Work and Disk IO when dealing with big jobs having a large number of intermediate results; (2) optimizing map function and Disk IO when dealing with short jobs.
出处 《国际计算机前沿大会会议论文集》 2016年第1期134-135,共2页 International Conference of Pioneering Computer Scientists, Engineers and Educators(ICPCSEE)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部