期刊文献+

POM:一个MPI程序的进程优化映射工具 被引量:1

POM:A Process Optimization Mapping Tool for MPI Programs
下载PDF
导出
摘要 现代超级计算机具有越来越多的计算结点,同时结点内具有多个处理器核。由于互联带宽的差异,结点间与结点内构成两个通信性能不同的通信层次,后者的通信性能好于前者。但是,目前MPI程序的默认进程映射未考虑该通信层次差异,无法利用结点内较好的通信带宽,严重束缚了超级计算机的性能发挥。针对该问题,本文设计实现了能利用层次通信差异的MPI程序自动进程优化映射工具POM,提供了高效、低开销获取MPI程序通信信息的方法,最终通过优化通信在通信层次上的分布提高了程序的通信效率,从而提高了应用程序的性能。本文解决了硬件平台通信层次的抽象、MPI程序通信信息的低开销获取与映射方案的计算三个问题。首先,按照通信能力差异将超级计算机结构抽象为高速互联的不同计算结点与相同结点上的多个处理器核两层。其次,提出了将集合通信转化成点到点通信的简单实现方法。最后,利用无向加权边图来表示MPI程序的进程间通信关系,将MPI程序的进程映射问题转化为图划分问题。在曙光5000A和曙光4000A上的实验结果表明,利用POM工具能够显著提高MPI程序的性能。 Modem supereornputers contain more computing nodes with many multi-core processors in one node Inter-node and in- tra-node hvae different bandwidth, and make up two different communication layers, the intra-node layer' s communication performance is better. The default process trapping of MPI do not consider the difference of bandwidth, so it decreases the performance of the computing platform. To resolve the problem, this paper introduces an automatic tool of optimizing process mapping for MPI programs, which supplies a low cost method of getting the communication information and optimizes the distribution of the communication of the system. So we can leverage the communication performance of the platform, and also better the performance of the program. First, to present the communication layer of the computing platform, supercomputer was simplified into two layers. The top is different computing nodes connected by high speed networks, the base is the multi-core processors on the same node, which has wider bandwidth. Sec- ond, we introduce a method to transform the collective communication into point-to-point and add it to the communication information. In the last, using undirected graph with edges of different weights to present the processes' communication relationship. So the process mapping problem now is a graph partitioning problem. This paper uses the open source software Chaco to solve the graph partitioning problem. The experiment proves that the POM can efficiently better the performance of MPI programs.
出处 《计算机工程与科学》 CSCD 北大核心 2009年第A01期201-205,共5页 Computer Engineering & Science
基金 国家自然科学基金资助项目(60633040)
关键词 进程映射 消息传递接口(MPI) 图划分 process mapping message passing interface graph partitioning
  • 相关文献

参考文献17

  • 1http://www, top500, org/.
  • 2Zhiwei Xu. HPC In China Dawning 5000 and Beyond[R]. Tech Report.
  • 3MPh A Message-Passing Interface Standard. Version 2. 1 [Z]. ZOOS.
  • 4Hendriekson B, Leland R.The Chaco User's Guide. Version 2.0[Z]. Sandia National Laborataries.
  • 5The Overview of Supercomputing[R]. Tech Report, 2007.
  • 6Miranda D C, Nieploeha J, Tipparaju V. Topology-Aware Tile Mapping for Clusters of SMPs[C]//Proc of the 3rd Conf on Computing Frontiers, 2006:383-392.
  • 7Yu Hao, Chung I-H, Moreira J. Topology Mapping for Blue Gene/L Supercomputer[C]//Proc of Supercomputing' 06, 2006 : 52-52.
  • 8Barbu A, Zhu Song-Chum Graph Partition by Swendsen- Wang Cuts[Z].
  • 9Tr aff J L.Implementing the MPI Process Topology Mechanism[C] //Prcc of Supercomputing ' 02,2002 : 1-14.
  • 10Pettey C C, Leuze M R. Parallel Placement of Parallel Processes[C]//Proc of HCCA' 88,1988 : 232-238.

同被引文献9

  • 1孙亦嘉,张岳,陈渝.基于VIA的MPICH2研究与实现[J].计算机工程与应用,2005,41(1):98-101. 被引量:2
  • 2李明,张玉敏,唐志敏.SMP系统上两种并行机制的比较[J].计算机工程与科学,1996,18(3):9-15. 被引量:3
  • 3Rashti M J,Green J,Balaji P,et al. Multi-core and network a- ware MPI topology functions [ C ]//Proceedings of the 18thEuropean MPI Users' Group Conference on Recent Advances in the Message Passing Interface. Heidelberg: [ s. n. ] ,2011 : 50-60.
  • 4Jeannot E, Mercier G. Near-optimal placement of MPI proces- ses on hierarchical NUMA architectures [ C ]//Proceedings of the 16th International Euro-Par Conference on Parallel Pro- cessing. Heidelberg : [ s. n. ] ,2010 : 199-210.
  • 5Xu Q, Subhlok J, Zheng R, et al. Logicalization of communica- tion traces from parallel execution[ C]//Proc of IEEE Interna- tional Symposium on Workload Characterization. Houston : [ s. n. ] ,2009:34-43.
  • 6Mercier G,Clet-Ortega J. Towards an efficient process place- ment policy for MPI applications in multicore environments [C]//Proeeedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Ma- chine and Message Passing Interface. Heidelberg: [ s. n. ], 2009:104-115.
  • 7Gropp W, Lusk E, Doss N, et al. A high-performance, portable implementation of the MPI message passing interface standard [ J ]. Parallel Computing, 1996,22 ( 6 ) :789-828.
  • 8Ashton D, Gropp W, Thakur R, et al. The CH3 design for a simple implementation of ADI- 3 for MPICH with a TCP- based implementation [ R/OL]. 2003. http ://phase, hpec. jp/ mirrors/mpi/mpich2/docs/tcpadi3, pdf.
  • 9Thakur R, Gropp W D. Improving the performance of collective operations in MPICH[ M ]//Recent Advances in Parallel Vir- tual Machine and Message Passing Interface. Heidelberg: [ s. n. ] ,2003:257-267.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部