摘要
高性能计算机不断发展,系统规模日益增加,系统内包含的计算结点数、处理器核数扩展到新的水平。在超大规模系统中,并行应用程序的启动时间成为限制系统运行效率、降低系统易用性的一个重要因素。在并行应用启动阶段,利用进程管理接口为进程部署通信通道,供进程后续通信使用。在百亿亿次级规模系统中,传统进程管理接口无法在启动时快速获得通信信息,导致启动时间过长,系统性能下降。首先介绍进程管理接口在并行程序启动过程中的作用,着重介绍面向百亿亿次级系统的进程管理接口PMIx,而后对比论述PMIx对于改进大规模并行程序启动的作用,分析PMIx在提升系统性能上做出的优化,以及未来发展方向。
With the continuous development of high-performance computing,the scale of the system increases constantly,and the number of nodes and processor cores in the system has expanded to a new level.Under the condition of hyperscale systems,the startup time of parallel applications becomes an important factor,which limits the system’s operating efficiency and reduces the ease of use.The process management interface is used to deploy a communication channel for the process during the parallel application startup phase for subsequent communication of the process.In exascale systems,the traditional process management interface cannot quickly obtain communication information at the startup phase,resulting in long startup time and the reduced system performance.We first introduce the role of the process management interface in the parallel program startup process,focus on the process management interface PMIx for exascale systems,compare and discuss the role of PMIx in improving the startup of large-scale parallel programs,analyze the optimization of PMIx in improving system performance,and discuss future development directions.
作者
张昆
张伟
卢凯
董勇
戴屹钦
ZHANG Kun;ZHANG Wei;LU Kai;DONG Yong;DAI Yi-qin(School of Computer,National University of Defense Technology,Changsha 410073,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第10期1774-1783,共10页
Computer Engineering & Science
基金
国家重点研发计划(2018YFB0204301)
国家数值风洞项目(NNW2018-ZT6B13)。
关键词
高性能计算
进程管理接口
进程通信
high-performance computing
process management interface
process communication