摘要
MapReduce是Google提出的一种分布式计算模型,已在海量数据处理领域得到了广泛的应用。提出一种基于树型结构的新型MapReduce并行模型。该模型适合于利用Internet或Intranet环境下不可靠的桌面PC资源进行海量科学数据分析。该模型以P2P的形式将计算节点进行组织,模型的底层采用了P2P-MPI框架,采用基于消息传递的模式来实现MapReduce应用层。在MapReduce应用层的实现中,在Map阶段采用广播的形式来分发数据块,在Reduce阶段建立反向二叉树来实现有效的结果合并和化简。将提出的MapReduce模型与现有主流MapReduce模型进行了比较,结果表明,基于树型结构的MapReduce并行模型在容错性能方面具有较优的性能,且系统简单,易于应用开发。
MapReduce is a distributed computing model introduced by Google, which has been widely used in the field of massive data processing. A novel MapReduce parallel model was presented in this paper. The model is suitable for mas- sive scientific data analysis, using unreliable desktop PC resources in the Internet or Intranet environment. Computing nodes are organized in the form of P2P, and the P2P-MPI framework is utilized in the lower layer, while message pas- sing interface model is utilized to achieve the MapReduce application layer. In the implementation of MapReduce applica- tion layer,the way of broadcast is used to distribute data chunks in the Map stage, and an inverse binary tree is con- structed to realize effective intermediate results reduction in the Reduce stage. The proposed MapReduce mode was compared with existing popular MapReduce modes. The results show that the proposed tree structure-based MapReduce parallel model has a good performance in terms of fault-tolerance and it is simple and easy for application development.
出处
《计算机科学》
CSCD
北大核心
2015年第11期65-67,89,共4页
Computer Science
基金
法国国家科研署科研项目(ANR-10-SEGI-001-01)
中科院百人计划(1101002001)
湖南省自然科学基金(2015JJ3071)
湖南省教育厅一般项目(12C0121)资助