POM:一个MPI程序的进程优化映射工具被引量：1

POM:A Process Optimization Mapping Tool for MPI Programs

下载PDF

导出

摘要现代超级计算机具有越来越多的计算结点,同时结点内具有多个处理器核。由于互联带宽的差异,结点间与结点内构成两个通信性能不同的通信层次,后者的通信性能好于前者。但是,目前MPI程序的默认进程映射未考虑该通信层次差异,无法利用结点内较好的通信带宽,严重束缚了超级计算机的性能发挥。针对该问题,本文设计实现了能利用层次通信差异的MPI程序自动进程优化映射工具POM,提供了高效、低开销获取MPI程序通信信息的方法,最终通过优化通信在通信层次上的分布提高了程序的通信效率,从而提高了应用程序的性能。本文解决了硬件平台通信层次的抽象、MPI程序通信信息的低开销获取与映射方案的计算三个问题。首先,按照通信能力差异将超级计算机结构抽象为高速互联的不同计算结点与相同结点上的多个处理器核两层。其次,提出了将集合通信转化成点到点通信的简单实现方法。最后,利用无向加权边图来表示MPI程序的进程间通信关系,将MPI程序的进程映射问题转化为图划分问题。在曙光5000A和曙光4000A上的实验结果表明,利用POM工具能够显著提高MPI程序的性能。 Modem supereornputers contain more computing nodes with many multi-core processors in one node Inter-node and in- tra-node hvae different bandwidth, and make up two different communication layers, the intra-node layer＇ s communication performance is better. The default process trapping of MPI do not consider the difference of bandwidth, so it decreases the performance of the computing platform. To resolve the problem, this paper introduces an automatic tool of optimizing process mapping for MPI programs, which supplies a low cost method of getting the communication information and optimizes the distribution of the communication of the system. So we can leverage the communication performance of the platform, and also better the performance of the program. First, to present the communication layer of the computing platform, supercomputer was simplified into two layers. The top is different computing nodes connected by high speed networks, the base is the multi-core processors on the same node, which has wider bandwidth. Sec- ond, we introduce a method to transform the collective communication into point-to-point and add it to the communication information. In the last, using undirected graph with edges of different weights to present the processes＇ communication relationship. So the process mapping problem now is a graph partitioning problem. This paper uses the open source software Chaco to solve the graph partitioning problem. The experiment proves that the POM can efficiently better the performance of MPI programs.

作者卢兴敬商磊陈莉

机构地区中国科学院计算技术研究所系统结构重点实验室澳大利亚新南威尔士大学

出处《计算机工程与科学》 CSCD 北大核心 2009年第A01期201-205,共5页 Computer Engineering & Science

基金国家自然科学基金资助项目(60633040)

关键词进程映射消息传递接口(MPI) 图划分 process mapping message passing interface graph partitioning

分类号 TP319 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1http://www, top500, org/.
2Zhiwei Xu. HPC In China Dawning 5000 and Beyond[R]. Tech Report.
3MPh A Message-Passing Interface Standard. Version 2. 1 [Z]. ZOOS.
4Hendriekson B, Leland R.The Chaco User's Guide. Version 2.0[Z]. Sandia National Laborataries.
5The Overview of Supercomputing[R]. Tech Report, 2007.
6Miranda D C, Nieploeha J, Tipparaju V. Topology-Aware Tile Mapping for Clusters of SMPs[C]//Proc of the 3rd Conf on Computing Frontiers, 2006:383-392.
7Yu Hao, Chung I-H, Moreira J. Topology Mapping for Blue Gene/L Supercomputer[C]//Proc of Supercomputing' 06, 2006 : 52-52.
8Barbu A, Zhu Song-Chum Graph Partition by Swendsen- Wang Cuts[Z].
9Tr aff J L.Implementing the MPI Process Topology Mechanism[C] //Prcc of Supercomputing ' 02,2002 : 1-14.
10Pettey C C, Leuze M R. Parallel Placement of Parallel Processes[C]//Proc of HCCA' 88,1988 : 232-238.

同被引文献9

1孙亦嘉,张岳,陈渝.基于VIA的MPICH2研究与实现[J].计算机工程与应用,2005,41(1):98-101. 被引量：2
2李明,张玉敏,唐志敏.SMP系统上两种并行机制的比较[J].计算机工程与科学,1996,18(3):9-15. 被引量：3
3Rashti M J,Green J,Balaji P,et al. Multi-core and network a- ware MPI topology functions [ C ]//Proceedings of the 18thEuropean MPI Users' Group Conference on Recent Advances in the Message Passing Interface. Heidelberg: [ s. n. ] ,2011 : 50-60.
4Jeannot E, Mercier G. Near-optimal placement of MPI proces- ses on hierarchical NUMA architectures [ C ]//Proceedings of the 16th International Euro-Par Conference on Parallel Pro- cessing. Heidelberg : [ s. n. ] ,2010 : 199-210.
5Xu Q, Subhlok J, Zheng R, et al. Logicalization of communica- tion traces from parallel execution[ C]//Proc of IEEE Interna- tional Symposium on Workload Characterization. Houston : [ s. n. ] ,2009:34-43.
6Mercier G,Clet-Ortega J. Towards an efficient process place- ment policy for MPI applications in multicore environments [C]//Proeeedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Ma- chine and Message Passing Interface. Heidelberg: [ s. n. ], 2009:104-115.
7Gropp W, Lusk E, Doss N, et al. A high-performance, portable implementation of the MPI message passing interface standard [ J ]. Parallel Computing, 1996,22 ( 6 ) :789-828.
8Ashton D, Gropp W, Thakur R, et al. The CH3 design for a simple implementation of ADI- 3 for MPICH with a TCP- based implementation [ R/OL]. 2003. http ://phase, hpec. jp/ mirrors/mpi/mpich2/docs/tcpadi3, pdf.
9Thakur R, Gropp W D. Improving the performance of collective operations in MPICH[ M ]//Recent Advances in Parallel Vir- tual Machine and Message Passing Interface. Heidelberg: [ s. n. ] ,2003:257-267.

引证文献1

1崔奇,谷建华.MPI集合通信剖析技术的研究[J].计算机技术与发展,2013,23(10):31-35.

1浅谈XP进程优化[J].科技与生活,2011(7):60-65.
2苏利.电子商务时代制造业应用CAD技术的发展思路[J].辽宁经济职业技术学院学报.辽宁经济管理干部学院,2001(2):38-39.
3崔奇,谷建华.MPI集合通信剖析技术的研究[J].计算机技术与发展,2013,23(10):31-35.
4叶健健,文志诚,吴欣欣,满君丰.基于多层次数据融合的网络安全态势分析方法研究[J].微型机与应用,2015,34(8):5-7. 被引量：4
5郑纬民,杨博,林伟坚,李志光.SMP机群系统上优化通信的并行任务调度[J].中国科学（E辑）,2001,31(5):442-454. 被引量：3
6龚尚福,陈婉璐,贾澎涛.层次聚类社区发现算法的研究[J].计算机应用研究,2013,30(11):3216-3220. 被引量：21
7杨发权.局域网上的点到点通信[J].电子技术（上海）,1999,26(8):3-5. 被引量：1
8罗定福,熊传玉,徐尤华.Delphi环境下基于UDP的聊天程序[J].电脑知识与技术（过刊）,2007(22):1013-1013.
9王润平,陈旺虎,段菊.一种科学工作流的云数据布局与任务调度策略[J].计算机仿真,2015,32(3):421-425. 被引量：8
10李东洋,王云岚.基于MPI的进程拓扑感知映射研究[J].微电子学与计算机,2013,30(5):67-71.

计算机工程与科学

2009年第A01期

浏览历史

内容加载中请稍等...

POM:一个MPI程序的进程优化映射工具被引量：1

参考文献17

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

POM:一个MPI程序的进程优化映射工具 被引量：1

参考文献17

同被引文献9

引证文献1

相关作者

相关机构

相关主题

浏览历史

POM:一个MPI程序的进程优化映射工具被引量：1