摘要
群通信是影响大规模数据并行系统效率的关键因素,其主要发生在程序不同阶段间的数组重分布与循环划分后的数组重映射这两种情况.在一次通信中显著影响群通信效率常被忽视的因素是消息冲突和消息长度的不一致.因为它们会导致进程间大量的空闲等待时间.然而以前的研究要么不能完全避免消息冲突,要么针对某些特殊情况.对此,提出了在数组分布为Block_Cyclic(k)情况下的一种更具有普遍适用性的通信调度策略CSS.通过证明表明该策略能使一个通信步内的消息互不冲突且消息长度尽量相等.从而最小化通信调度生成时间和实际通信时间.最后的测试结果也表明,与传统的通信优化算法和MPI_Alltoallv实现相比,CSS策略使得通信效率得以明显提高.
Collective communication significantly influences the performance of data parallel applications. It is required often in two situations: One is array redistribution from phase to phase another is data remapping after loop partition. Nevertheless, an important factor that influences the efficiency of collective communication is often neglected: When there is node contention and difference among message lengths during one particular communication step, a larger communication idle time may occur. In previous works, researchers can't completely avoid communication conflict and focus on some special cases. This paper is devoted to develop an universal and efficient communication scheduling strategy (CSS) concerning with the situation where array distributions are Block_Cyclic(k). Base on the proof for the recursive theorem of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that the strategy not on- ly avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv.
出处
《计算机学报》
EI
CSCD
北大核心
2008年第2期318-328,共11页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2006AA01Z105)
国家自然科学基金(60373008)
教育部重点基金(106019)资助
关键词
并行编译
数据并行
组通信
数组重分布
分布内存
parallel compiling
data parallelism
collective communication
array redistribution distributed memory