期刊文献+

一种Nehalem平台上的MPI多级分段归约算法

A Hierarchical-segment Reduction Algorithm for Nehalem Systems in Threaded MPI
下载PDF
导出
摘要 基于线程MPI环境提出一种适用于Nehalem平台长消息归约的多级分段归约算法(HSRA).HSRA考虑了Nehalem系统的体系结构特点,分处理器内归约和处理器外归约两个步骤实施节点内归约通信,在均匀分布计算负载的前提下仅需要较少的远端内存访问.首先在MPIActor的归约算法框架中设计、实现了HSRA算法,从访存角度分析了HSRA算法的开销,然后与单级分段和已有的另外三种节点内基于共享内存的归约算法进行比较;最后在真实系统上通过IMB(Intel MPI Benchmark)验证算法,实验结果表明,该算法是一种适用于在Nehalem系统中处理长消息节点内归约的高效算法. A new intra-node reduction algorithm called Hierarchical-Segment Reduction Algorithm(HSRA) is proposed, which is for Nehalem systems based on threaded MPI environment. By considering the character of Nehalem micro architecture, HSRA imple- ments a intra-node reduction communication in two steps which refers as inter-processor reduction and outer-processor reduction, the design can balance computing loads with less remote memory access. First in MPIActor reduction algorithm framework implementing HSRA based on access and storation, then comparing of reduction algorithm with single segment and other three nodes, finally testing the algorithm in Intel MPI Benchmark. The experiment shows that HSRA is an effective algorithm for long message reduction on Nehalem systems.
出处 《小型微型计算机系统》 CSCD 北大核心 2012年第4期733-738,共6页 Journal of Chinese Computer Systems
基金 福建省科技厅重大项目(2010H6019)资助 福建省莆田市科技计划项目(2010G09)资助
关键词 多级分段归约算法 MPI HSRA NEHALEM MPI归约 MPI全归约 hierarchical-segment reduction algorithm MPI HSRA Nehalem MPI_reduce MPI_aUreduce
  • 相关文献

参考文献12

  • 1Mamidala A R, Kumar R, De D, et al, MPI collectives on modem multicore clusters: performance optiraizations and communication characteristics[ C]. Cluster Computing and the Grid, CCGRID'08, 8th IEEE International Symposium on, 2008:130-137.
  • 2Chan E W, Heimlich M F, Purkayastha A, et al. On optimizing collective communication [ C ]. Conference Location : Conference Name, 2004 : 145-155.
  • 3Graham R L, Shipman G. MPI support for multi-core architectures : optimized shared memory collectives [ C ]. Conference Location: Conference Name, 2008:130-140,.
  • 4OpenMPI websit[EB/OL], http://www, open-mpi, org,2011-02-10.
  • 5MPICH2 websit [ EB/OL]. http://www, mcs. anl. gov/mpi/ mpich2, 2011-02-01.
  • 6刘志强,宋君强,卢风顺,赵娟.基于线程的MPI通信加速器技术研究[J].计算机学报,2011,34(1):154-164. 被引量:11
  • 7Patarasuk P,Yuan X. Bandwidth optimal all-reduce algorithms for clusters of workstations [ J ]. J. Parallel Distrib. Comput. , 2009,69(2) :117-124.
  • 8Ritzdorf H,Triff J L. Collective operations in NEC's high-perform- ance MPI libraries [ C ]. Conference Location: Conference Name, 2006.
  • 9top 500 websit [ EB/OL]. http ://www. top500, org ,2011-02-01.
  • 10MVAPICH2 websit [ EB/OL]. http://mvapich, cse. ohio-state. edu ,2011-01-23.

二级参考文献9

  • 1Chai L, Gao Q, Panda D K. Understanding the impact of multi core architecture in cluster computing: A case study with InteI Dual Core system//Proceedings of the CCGrid'07. Rio de Janeiro, Brazil, 2007:471 -478.
  • 2Tang H, Shen K, Yang T. Program transformation and runtime support for threaded MPI execution on shared memory machines. ACM Transactions on Programming Languages and Systems, 2000, 22(4): 673- 700.
  • 3Demaine E D. A threads only MPI implementation for the development of parallel programs//Proceedings of the Ilth In ternational Symposium on High Performance Computing Sys terns. Winnipeg, Manitoba, Canada, 1997:153-163.
  • 4Prakash S, Bagrodia R. MPI -SIM: Using parallel simulation to evaluate MPI programs//Proceedings of the Winter Simula tion. Los Aamitos, CA, USA, 1998:467- 474.
  • 5Saini S, Naraikin A et al. Early performance evaluation of a Nehalem" cluster using scientific and engineering applications//Proceedings of the SC'09. New York, USA, 2009, Article 21,12 pages.
  • 6Diaz Martin J C, Rico Gallego J A et al. An MPI -1 corn pliant thread based implementation//Proceedings o{ the EuroPVM/ MP1 2009. Berlin, Heidelberg, 2009:327- 328.
  • 7Sade Y, Sagiv S, Shaham R. Optimizing C multithreaded memory management using thread local storage//Proceedings of the CC'05. Berlin, Heidelberg, 2005:137-155.
  • 8Jin H W, Sur S, Chai L, Panda D K. LiMIC: Support for high-performance MPI Intra Node communication on Linux cluster//Proceedings of the ICPP'05. Washington, DC,USA, 2005, 184- 191.
  • 9Moreaud S, Goglin B, Goodell D, Namyst R. Optimizing MPI communication within large multicore nodes with kernel assislance//Proceedings of the Workshop on Communication Ar chitecture for Clusters, held in Conjunction with IPDPS 2010. Atlanta, USA, 2010.

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部