摘要
MPI聚合通信操作广泛应用于并行科学计算中,对程序的可扩展性和性能有重要影响。天河互连网络支持基于触发的消息通信操作,通过在网络接口中卸载执行数据传输和计算操作,提高结点通信性能。利用触发操作,实现结点之间的归约操作通信卸载,设计了不同树形结构的Allreduce和Reduce通信卸载算法。基于实际系统平台的测试表明,与MPICH中基于点对点通信实现的归约算法相比,基于触发的通信卸载算法在不同规模下运行时间最多可降低59.6%。
MPI collective communication operation is widely used in parallel scientific application,which has an important influence impact on the scalability of the program.Tianhe interconnect network supports the trigger communication operations,which can offload the messaging and processing work and improve the performance between nodes.Allreduce and Reduce algorithms under different tree topological structures are designed by using the triggered operations to lower the latency the reduction operation communication between nodes.Tests based on the actual system platform show that that,compared with the point-to-point implementation of these two types of operations in MPICH,the offload algorithm based on trigger can reduce the running time by up to 59.6%at different node scales.
作者
王浩
张伟
谢旻
董勇
WANG Hao;ZHANG Wei;XIE Min;DONG Yong(School of Computer,National University of Defense Technology,Changsha 410073,China)
出处
《计算机工程与科学》
CSCD
北大核心
2020年第11期1981-1987,共7页
Computer Engineering & Science
基金
国家重点研发计划(2018YFB0204301)
国防科技大学科研项目(ZK18-03-10)。
关键词
聚合通信
归约
触发操作
通信卸载
collective communication
reduction
triggered operations
offloaded communication