期刊文献+

一种数据并行中的群通信优化策略 被引量:3

An Optimized Strategy for Collective Communication in Data Parallelism
下载PDF
导出
摘要 群通信是影响大规模数据并行系统效率的关键因素,其主要发生在程序不同阶段间的数组重分布与循环划分后的数组重映射这两种情况.在一次通信中显著影响群通信效率常被忽视的因素是消息冲突和消息长度的不一致.因为它们会导致进程间大量的空闲等待时间.然而以前的研究要么不能完全避免消息冲突,要么针对某些特殊情况.对此,提出了在数组分布为Block_Cyclic(k)情况下的一种更具有普遍适用性的通信调度策略CSS.通过证明表明该策略能使一个通信步内的消息互不冲突且消息长度尽量相等.从而最小化通信调度生成时间和实际通信时间.最后的测试结果也表明,与传统的通信优化算法和MPI_Alltoallv实现相比,CSS策略使得通信效率得以明显提高. Collective communication significantly influences the performance of data parallel applications. It is required often in two situations: One is array redistribution from phase to phase another is data remapping after loop partition. Nevertheless, an important factor that influences the efficiency of collective communication is often neglected: When there is node contention and difference among message lengths during one particular communication step, a larger communication idle time may occur. In previous works, researchers can't completely avoid communication conflict and focus on some special cases. This paper is devoted to develop an universal and efficient communication scheduling strategy (CSS) concerning with the situation where array distributions are Block_Cyclic(k). Base on the proof for the recursive theorem of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that the strategy not on- ly avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv.
出处 《计算机学报》 EI CSCD 北大核心 2008年第2期318-328,共11页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划项目基金(2006AA01Z105) 国家自然科学基金(60373008) 教育部重点基金(106019)资助
关键词 并行编译 数据并行 组通信 数组重分布 分布内存 parallel compiling data parallelism collective communication array redistribution distributed memory
  • 相关文献

参考文献1

二级参考文献12

  • 1[1]High Performance Fortran Forum. High Performance Fortran Language Specification. Version 2.0, 1997. http://www.crpc.rice.edu/HPFF/home.html
  • 2[2]Adve, V., Mellor-Crummey, J. Advanced code generation for high performance Fortran. In: Languages, Compilation Techniques and Run Time Systems for Scalable Parallel Systems, Chapter 18. Lecture Notes in Computer Science Series, Springer Verlag, 1997. http://www.cs.rice.edu/~dsystem/techPapers.html
  • 3[3]Lim, A.W., Cheong, G.I., Lam, M.S. An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing. 1999. http://www.acm.org/pubs/contents/proceedings/isca/
  • 4[4]Gupta, M., Midkiff, S., Schonberg, E., et al. An HPF compiler for the IBM SP2. In: Proceedings of the 1995 ACM/IEEE Supercomputing Conference. 1995. http://www.supercomp.org/sc95/proceedings/
  • 5[5]Bozkus, Z., Meadows, L., Nakamoto, S., et al. Compiling high performance fortran. In: Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. CA: San Francisco, 1995. 704~709. http://www.siam.org/meetings/archives
  • 6[6]Benkner, S., Chapman, B., Zima, H. Vienna Fortran 90. In: Proceedings of the 1992 Scalable High Performance Computing Conference. Williamsburg, VA, 1992.
  • 7[7]Harris, J., Bircsak, J., Bolduc, M.R., et al. Compiling high performance fortran for distributed-memory systems. Digital Technical Journal of Digital Equipment Corporation, 1995,7(3):5~23.
  • 8[8]Hiranandani, S., Kennedy, K., Tseng, C.-W. Preliminary experiences with the Fortran D compiler. In: Proceedings of the Supercomputing'93. Portland, OR, 1993. http://www.acm.org/pubs/contents/proceedings/supercomputing
  • 9[9]Rogers, A., Pingali, K. Process decomposition through locality of reference. In: Proceedings of the SIGPLAN'89 Conference on Programming Language Design and Implementation. Portland, OR, 1989. http://www.acm.org/pubs/contents/proceedings
  • 10[10]Kenneth Hawick. High Performance Computing and Communications Glossary. Technical Report CRPC-TR94627, Center for Research on Parallel Computation, Rice University, 1994.

共引文献1

同被引文献46

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部