Method for improving MapReduce performance by prefetching before scheduling
Method for improving MapReduce performance by prefetching before scheduling
摘要
In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input data of these tasks will be preloaded before the tasks are scheduled. During the execution, the input data can be read from local nodes. Therefore, the delay can be hidden. The technique has been implemented in Hadoop-0. 20.1. The experiment results have shown that the technique reduces map tasks causing delay, and improves the performance of Hadoop MapRe- duce by 20%.
参考文献17
-
1Dean J, Ghemawat S. MapReduce: Simplified data pro- cessing on large clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementa- tion, San Francesco, USA, 2004. 137-150.
-
2Hadoop. http ://hadoop. apache, org: Apache, 2010.
-
3Amazon Elastic MapReduce. http ://aws. amazon, corn/ elasticmapreduce: Amazon, 2010.
-
4Wegener D, Mock M, Adranale D, et al. Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of International Conference on Data Mining Workshops, Miami, USA, 2009. 296-301.
-
5Zhang S, Han J, Liu Z, et al. Spatial queries evaluation with MapReduee. In: Proceedings of 8th International Conference on Grid and Cooperative Computing, Lanzhou, China, 2009. 287-292.
-
6Leo S, Santoni F, Zanetti G. Biodoop: Bioiifformatics onHadoop. In: Proceedings of International Conference on Parallel Processing Workshops, Vienna, Austria, 2009. 415-422.
-
7Zaharia M, Borthakur D, Sarma J, et al. Job Scheduling for Multi-User MapReduce Clusters, http ://www. eecs. berkeley, edu, 2009.
-
8Byna S, Chen Y, Sun X. A taxonomy of data prefetching mechanisms. In: Proceedings of the International Sympo- sium on Pervasive Systems, Algorithms, and Networks, Washington DC, USA, 2008.19-24.
-
9Byna S, Chen Y, Sun X. Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Austin, USA, 2008. 1-12.
-
10Chang F, Gibson G. Automatic I/O hint generation through speculative execution. In: Proceedings of USE- NIX Symposium on Operating Systems Design and Imple- mentation, New Orleans, USA, 1999. 1-14.
-
1搜狐视频Andriod客户端推语音搜索和预加载功能[J].互联网天地,2011(7):62-62.
-
2范书国,刘杰,厉萍.Web应用中图片预加载技术的设计与实现[J].计算机光盘软件与应用,2013,16(12):101-102.
-
3崔可可.压力传感器预加载机结构设计[J].科技风,2015(6):50-50.
-
4贾红涛.提高SharePoint页面访问速度之应用池预加载[J].计算机与网络,2017,43(7):48-48.
-
5于海鹏,张旭阳.基于Web的应用系统前端页面性能优化[J].福建电脑,2012,28(5):11-12. 被引量:3
-
6邱丽娟.全方位提速网站的技术手段[J].中国新通信,2017,19(1):76-77.
-
7张晨阳.“引擎切换”浏览器[J].网络运维与管理,2013(5):121-121.
-
8Yang LIU,Jian-zhong HUANG,Xiao-dong SHI,Qiang CAO,Chang-sheng XIE.Strip-oriented asynchronous prefetching for parallel disk systems[J].Journal of Zhejiang University-Science C(Computers and Electronics),2012,13(11):799-815.
-
9杨润斌,梁文章,陈姝.基于WebGL的3D动画实时播放系统[J].计算机系统应用,2015,24(11):58-63. 被引量:5
-
10张春明,芮建武,何婷婷.一种Hadoop小文件存储和读取的方法[J].计算机应用与软件,2012,29(11):95-100. 被引量:39