期刊文献+

基于树型结构的MapReduce并行模型

MapReduce Parallel Model Based on Tree Structure
下载PDF
导出
摘要 MapReduce是Google提出的一种分布式计算模型,已在海量数据处理领域得到了广泛的应用。提出一种基于树型结构的新型MapReduce并行模型。该模型适合于利用Internet或Intranet环境下不可靠的桌面PC资源进行海量科学数据分析。该模型以P2P的形式将计算节点进行组织,模型的底层采用了P2P-MPI框架,采用基于消息传递的模式来实现MapReduce应用层。在MapReduce应用层的实现中,在Map阶段采用广播的形式来分发数据块,在Reduce阶段建立反向二叉树来实现有效的结果合并和化简。将提出的MapReduce模型与现有主流MapReduce模型进行了比较,结果表明,基于树型结构的MapReduce并行模型在容错性能方面具有较优的性能,且系统简单,易于应用开发。 MapReduce is a distributed computing model introduced by Google, which has been widely used in the field of massive data processing. A novel MapReduce parallel model was presented in this paper. The model is suitable for mas- sive scientific data analysis, using unreliable desktop PC resources in the Internet or Intranet environment. Computing nodes are organized in the form of P2P, and the P2P-MPI framework is utilized in the lower layer, while message pas- sing interface model is utilized to achieve the MapReduce application layer. In the implementation of MapReduce applica- tion layer,the way of broadcast is used to distribute data chunks in the Map stage, and an inverse binary tree is con- structed to realize effective intermediate results reduction in the Reduce stage. The proposed MapReduce mode was compared with existing popular MapReduce modes. The results show that the proposed tree structure-based MapReduce parallel model has a good performance in terms of fault-tolerance and it is simple and easy for application development.
作者 唐兵 贺海武
出处 《计算机科学》 CSCD 北大核心 2015年第11期65-67,89,共4页 Computer Science
基金 法国国家科研署科研项目(ANR-10-SEGI-001-01) 中科院百人计划(1101002001) 湖南省自然科学基金(2015JJ3071) 湖南省教育厅一般项目(12C0121)资助
关键词 MAPREDUCE 树型结构 二叉树 消息传递接口 MapReduce, Tree structure, Binary tree, Message passing interface(MPI)
  • 相关文献

参考文献13

  • 1Dean J,Ghemawat S. MapReduce:Simplified Data Processing onLarge Clusters[J]. Communications of the ACM, 2008. 51(1):107-113.
  • 2Anderson D P. BOINC: A System for Public-Resource Compu-ting and Storage[C] // Proc. of the 5 th International Workshopon Grid Computing (GRID 2004). 2004:4-10.
  • 3Cappello F, Djilali S.Fedak G, et al. Computing on Large~scaleDistributed Systems : XtremWeb Architecture. ProgrammingModels. Security, Tests and Convergence with Grid[J]. FutureGeneration Computer Systems,2005,21(3) :417-437.
  • 4Litzkow M J,Livny M,Mutka M W_ Condor-A Hunter of IdleWorkstations[C] // Proc. of the 8th International Conference onDistributed Computing Systems (ICDCS 1988). 1988:104-111.
  • 5Lin H,Ma X,Feng W. Reliable MapReduce Computing on Op-portunistic ResourcesQJ. Cluster Computing, 2012. 15 (2) . 145-161.
  • 6Marozzo F,Talia D,Trunfio P. P2P-Mapreduce : parallel dataprocessing in dynamic cloud environments [J]. Journal of Com-puter and System Sciences,2012.78(5) ..1382-1402.
  • 7Costa F,Silva J N. Veiga L,et al. Largerscale volunteer compu-ting over the Internet [J], Journal of Internet Services and Ap-plications, 2012 . 3(3) :329- 346.
  • 8Tang B, Moca M, Chevalier S,et al. Towards mapreduce fordesktop grid computing[C] //Proc. of the 5th International Con-ference on P2P, Parallel, Grid. Cloud and Internet Computing(3PGCIC 2010). 2010:193-200.
  • 9Lu L, Jin H,Shi X,et al. Assessing mapreduce for Internet com-puting :a comparison of Hadoop and BitDew-MapReduce[C] //Proc. of the 13th ACM/IEEE International Conference on GridComputing (GRID 2012). 2012:76-84.
  • 10Genaud S,Rattanapoka C. P2P-MPI ; A Peei^to-Peer Frameworkfor Robust Execution of Message Passing Parallel Programs onGrids[J]. Journal of Grid Comp.uting,2009,5(l) :27-42.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部