期刊文献+

基于多阶段划分的MapReduce模型 被引量:3

MapReduce performance model based on multi-phase dividing
下载PDF
导出
摘要 针对已有的MapReduce模型阶段划分粒度不合理导致模型精度和复杂度存在的问题,提出了阶段划分粒度为5的多阶段MapReduce模型(MR-Model)。首先综述了MapReduce模型的研究现状;然后将MapReduce划分为Read、Map、Shuffle、Reduce、Write共5个阶段,并对每个阶段的具体运行时间进行研究;最后通过实验对模型的预测性能进行验证。实验结果表明,提出的MR-Model可用来描述MapReduce实际任务的执行过程,与另外两种不同划分粒度的模型P-Model和H-Model相比,MR-Model模型的运行时间预测精度可以提高10%-30%,在Reduce阶段的运行时间预测精度可以提高2-3倍,综合性能较好。 In order to resolve the low precision and complexity problem of the existing MapReduce model caused by the reasonable phase partitioning granularity, a multi-phase MapReduce Model( MR-Model) with 5 partition granularities was proposed. Firstly, the research status of MapReduce model was reviewed. Secondly, the MapReduce job was divided into 5phases of Read, Map, Shuffle, Reduce, Write and the specific processing time of each phase was studied. Finally, the MRmodel prediction performance was tested by experiments. The experimental results show that MR-Model is suitable for the MapReduce actual job execution process. Compared with the two existing models of P-Model and H-Model, the time accuracy precision of MR-Model can be improved by 10%- 30%; in the Reduce phase, its time accuracy precision can be improved by2- 3 times, the comprehensive property of the MR-Model is better.
出处 《计算机应用》 CSCD 北大核心 2015年第12期3374-3377,3382,共5页 journal of Computer Applications
基金 总装备部预研项目(513150701)
关键词 云计算 MAPREDUCE 性能模型 多阶段划分 划分粒度 cloud computing MapReduce performance model multi-phase division partition granularity
  • 相关文献

参考文献15

  • 1DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters [ J]. Communications of the ACM, 2008, 51( 1): 107 - 113.
  • 2覃雄派,王会举,杜小勇,王珊.大数据分析——RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(1):32-45. 被引量:386
  • 3JAHANI E, CAFARELLA M J, RI C. Automatic optimization for MapReduce programs [ J]. Proceedings of the VLDB Endowment, 2011, 4(6): 385 -396.
  • 4SONG G, MENG Z, HUET F, et al. A Hadoop MapReduce per- formance prediction method [C]// HPCC_EUC 2013: Proceedings of the 2013 IEEE 10th International Conference on High Perform- ance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing. Piscataway: IEEE, 2013:820-825.
  • 5YANG X, SUN J. An analytical performance model of MapReduce [ C]// Proceedings of the 2011 IEEE International Conference on Cloud Computing and Intelligence Systems. Piscataway: IEEE, 2011:306-310.
  • 6BERLINSKA J, DROZDOWSKI M. Scheduling divisible MapRe- duce computations [ J]. Journal of Parallel and Distributed Compu- ting, 2011, 71(3): 450-459.
  • 7ZAHARIA M, KONWINSKI A, JOSEPH A D, et al. Improving MapReduce performance in heterogeneous environments [ C ]// OSDr08: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. Berkeley: USENIX Associa- tion, 2008:29-42.
  • 8VIANNA E, COMARELA G, PONTES T, et al. Analytical per- formanee models for MapReduce workloads [ J]. International Jour- nal of Parallel Programming, 2013, 41(4) : 495 -525.
  • 9VERMA A, CHERKASOVA L, CAMPBELL R H. ARIA: automat- ic resource inference and allocation for MapReduce environments [ C]// Proceedings of the 8th ACM International Conference on Au- tonomic Computing. New York: ACM, 2011:235-244.
  • 10LI B, MAZUR E, DIAO Y, et al. A platform for scalable one-pass analytics using MapReduce [ C]// Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2011:985-996.

二级参考文献82

  • 1Zhou MQ, Zhang R, Zeng DD, Qian WN, Zhou AY. Join optimization in the MapReduce environment for column-wise data store. In: Fang YF, Huang ZX, eds. Proc. of the SKG. Ningbo: IEEE Computer Society, 2010.97-104. [doi: 10.1109/SKG.2010.18].
  • 2Afrati FN, Ullman JD. Optimizing joins in a Map-Reduce environment. In: Manolescu I, Spaecapietra S, Teubner J, Kitsuregawa M, Leger A, Naumann F, Ailamaki A, Ozcan F, eds. Proc. of the EDBT. Lausanne: ACM Press, 2010. 99-110. [doi: 10.1145/ 1739041.1739056].
  • 3Sandholm T, Lai K. MapReduce optimization using regulated dynamic prioritization. In: Douceur JR, Greenberg AG, Bonald T, Nieh J, eds. Proc. of the SIGMETRICS. Seattle: ACM Press, 2009. 299-310. [doi: 10.1145/1555349.1555384].
  • 4Hoefler T, Lumsdaine A, Dongarra J. Towards; efficient MapReduce using MPI. In: Oster P, ed. Proc. of the EuroPVM/MPI. Berlin: Springer-Verlag, 2009. 240-249. [doi: 10.100'7/978-3-642-03770-2_30].
  • 5Nykiel T, Potamias M, Mishra C, Kollios G, Koudas N. MRShare: Sharing across multiple queries in MapReduce. PVLDB, 2010, 3(1-2):494-505.
  • 6Kambatla K, Rapolu N, Jagannathan S, Grama A. Asynchronous algorithms in MapReduce. In: Moreira JE, Matsuoka S, Pakin S, Cortes T, eds. Proc. of the CLUSTER. Crete: IEEE Press, 2010. 245-254. [doi: 10.1109/CLUSTER.2010.30].
  • 7Polo J, Carrera D, Becerra Y, Torres J, Ayguad6 E, Steinder M, Whalley I. Performance-Driven task co-scheduling for MapReduce environments. In: Tonouchi T, Kim MS, eds. Proc. of the 1EEE Network Operations and Management Symp. (NOMS). Osaka: IEEE Press, 2010. 373-380. [doi: 10.1109/NOMS.2010.5488494].
  • 8Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I. Improving MapReduce performance in heterogeneous environments. In: Draves R, van Renesse R, eds. Proc. of the ODSI. Berkeley: USENIX Association, 2008.29-42.
  • 9Xie J, Yin S, Ruan XJ, Ding ZY, Tian Y, Majors J, Manzanares A, Qin X. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: Taufer M, Rfinger G, Du ZH, eds. Proc. of the Workshop on Heterogeneity in Computing (IPDPS 2010). Atlanta: IEEE Press, 2010. 1-9. [doi: 10.1109/IPDPSW.2010.5470880].
  • 10Polo J, Carrera D, Becerra Y, Beltran V, Torres J, Ayguad6 E. Performance management of accelerated MapReduce workloads in heterogeneous clusters. In: Qin F, Barolli L, Cho SY, eds. Proc. of the ICPP. San Diego: IEEE Press, 2010. 653-662. [doi: 10.1109/ ICPP.2010.73].

共引文献385

同被引文献27

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部