摘要
随着高性能计算机系统与并行应用规模的不断增加,大规模并行作业的启动时间不能再被忽略不计.已有的研究给出了在Tianhe-1A系统上加载MPI作业的性能结果.通过分析作业启动在控制消息传递、文件访问、MPI环境初始化等各阶段的时间开销,发现对于大规模MPI作业而言,环境初始化时间是作业启动的主要开销.基于此发现进行了一些优化,减少MPI环境初始化时交换的数据量,并避免不必要的数据传输开销.显著地提高了并行作业启动的性能.进而提出了一种层次式的可扩展进程管理结构,以进一步增强作业启动的可扩展性.与其他主流MPI实现的进程管理机制的作业启动时间进行了比较.
As the scale of HPC systems and parallel applications keep increasing, time of large-scale parallel job startup cannot be ignored anymore. Various efforts have been made to improve the performance of program launching and runtime environment initialization. The experiences and results of starting MPI jobs, on Tianhe-lA supercomputer system are presented. Detailed study of the time costs of job startup in different stages, including control message transferring, file access, and MPI environment initialization, shows that for large scale MPI jobs, the environment initialization time dominates the job startup time. Based on this discovery, some preliminary optimization work has been done to reduce the data exchanged during MPI environment initialization and avoid unnecessary data transfer costs. The optimization improves the job startup performance notably. An optimizing process management design with hierarchical structure for MPI environment initialization is proposed to further improve the scalability of job startup. For completeness, we also compare and analyze the job start time of other process management mechanism in main-stream MPI implementations.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第8期1755-1761,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61120106005)
国家"八六三"高技术研究发展计划基金项目(2012AA01A301)
关键词
高性能计算
并行作业启动
进程管理
MPI
可扩展性
high performance computing
parallel job startupl process management
MPI
scalability