一种数据并行中的群通信优化策略被引量：3

An Optimized Strategy for Collective Communication in Data Parallelism

下载PDF

导出

摘要群通信是影响大规模数据并行系统效率的关键因素,其主要发生在程序不同阶段间的数组重分布与循环划分后的数组重映射这两种情况.在一次通信中显著影响群通信效率常被忽视的因素是消息冲突和消息长度的不一致.因为它们会导致进程间大量的空闲等待时间.然而以前的研究要么不能完全避免消息冲突,要么针对某些特殊情况.对此,提出了在数组分布为Block_Cyclic(k)情况下的一种更具有普遍适用性的通信调度策略CSS.通过证明表明该策略能使一个通信步内的消息互不冲突且消息长度尽量相等.从而最小化通信调度生成时间和实际通信时间.最后的测试结果也表明,与传统的通信优化算法和MPI_Alltoallv实现相比,CSS策略使得通信效率得以明显提高. Collective communication significantly influences the performance of data parallel applications. It is required often in two situations： One is array redistribution from phase to phase another is data remapping after loop partition. Nevertheless, an important factor that influences the efficiency of collective communication is often neglected： When there is node contention and difference among message lengths during one particular communication step, a larger communication idle time may occur. In previous works, researchers can＇t completely avoid communication conflict and focus on some special cases. This paper is devoted to develop an universal and efficient communication scheduling strategy （CSS） concerning with the situation where array distributions are Block_Cyclic（k）. Base on the proof for the recursive theorem of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that the strategy not on- ly avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv.

作者王珏胡长军张纪林李建江

机构地区北京科技大学信息工程学院

出处《计算机学报》 EI CSCD 北大核心 2008年第2期318-328,共11页 Chinese Journal of Computers

基金国家"八六三"高技术研究发展计划项目基金(2006AA01Z105) 国家自然科学基金(60373008) 教育部重点基金(106019)资助

关键词并行编译数据并行组通信数组重分布分布内存 parallel compiling data parallelism collective communication array redistribution distributed memory

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献1

1余华山,胡长军,黄其军,丁文魁,许卓群.一个用于数据并行语言计算划分的时序优化模型(英文)[J].软件学报,2001,12(10):1434-1446. 被引量：2

二级参考文献12

1[1]High Performance Fortran Forum. High Performance Fortran Language Specification. Version 2.0, 1997. http://www.crpc.rice.edu/HPFF/home.html
2[2]Adve, V., Mellor-Crummey, J. Advanced code generation for high performance Fortran. In: Languages, Compilation Techniques and Run Time Systems for Scalable Parallel Systems, Chapter 18. Lecture Notes in Computer Science Series, Springer Verlag, 1997. http://www.cs.rice.edu/～dsystem/techPapers.html
3[3]Lim, A.W., Cheong, G.I., Lam, M.S. An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing. 1999. http://www.acm.org/pubs/contents/proceedings/isca/
4[4]Gupta, M., Midkiff, S., Schonberg, E., et al. An HPF compiler for the IBM SP2. In: Proceedings of the 1995 ACM/IEEE Supercomputing Conference. 1995. http://www.supercomp.org/sc95/proceedings/
5[5]Bozkus, Z., Meadows, L., Nakamoto, S., et al. Compiling high performance fortran. In: Proceedings of the 7th SIAM Conference on Parallel Processing for Scientific Computing. CA: San Francisco, 1995. 704～709. http://www.siam.org/meetings/archives
6[6]Benkner, S., Chapman, B., Zima, H. Vienna Fortran 90. In: Proceedings of the 1992 Scalable High Performance Computing Conference. Williamsburg, VA, 1992.
7[7]Harris, J., Bircsak, J., Bolduc, M.R., et al. Compiling high performance fortran for distributed-memory systems. Digital Technical Journal of Digital Equipment Corporation, 1995,7(3):5～23.
8[8]Hiranandani, S., Kennedy, K., Tseng, C.-W. Preliminary experiences with the Fortran D compiler. In: Proceedings of the Supercomputing'93. Portland, OR, 1993. http://www.acm.org/pubs/contents/proceedings/supercomputing
9[9]Rogers, A., Pingali, K. Process decomposition through locality of reference. In: Proceedings of the SIGPLAN'89 Conference on Programming Language Design and Implementation. Portland, OR, 1989. http://www.acm.org/pubs/contents/proceedings
10[10]Kenneth Hawick. High Performance Computing and Communications Glossary. Technical Report CRPC-TR94627, Center for Research on Parallel Computation, Rice University, 1994.

共引文献1

1姜春茂,吴翔虎,段莹,曲明成,刘敏,李志聪.一个基于消息传递接口和面向对象的图像信号处理并行向量库[J].计算机应用研究,2012,29(12):4560-4563. 被引量：1

同被引文献46

1李倩,洪延姬,曹正蕊.吸气式激光推进推力产生机理的数值模拟[J].爆炸与冲击,2006,26(6):550-555. 被引量：25
2蒋光庆,文锋.并行程序实用优化方法讨论[J].信息工程大学学报,2006,7(4):361-363. 被引量：1
3刘鑫,陆林生.重叠网格CFD并行计算的通信优化研究[J].计算机工程与设计,2006,27(24):4611-4614. 被引量：2
4许仁萍,唐志平.大气模式下多脉冲激光推进的数值模拟[J].强激光与粒子束,2007,19(3):369-372. 被引量：8
5童慧峰,唐志平,张凌.烧蚀模式激光推进的数值模拟[J].爆炸与冲击,2007,27(2):165-170. 被引量：8
6Kantrowitz A. Propulsion to Orbit Ground-Based Lasers[J]. Astronautics & Aeronautics, 1972,10 :74-76.
7Thompson S L, Lauson H S. Improvements in the Chart-D Radiation-Hydrodynamic Code IIh Revised Analytical Equation of State[R]. SC-RR-710714,1972.
8Christiansen J P, Winsor N K. Castor 2..A Two-Dimensional Laser Target Code[J].Computer Physics Counications, 1979,17:397-401.
9Ping G, Zhiping T. Numerical Simulation for Laser Propulsion of Air Breathing Mode Considering Moving Boundaries and Multi-Pulses[C]//Proc of the 4 Int'l Syrup on Beamed Energy Propulsion, 2006: 87-94.
10Sato M,,Satoh S,Kusano K,et al.Design of OpenMP compiler for an SMP cluster. Proceedings of the1st European Workshop on OpenMP . 1999

引证文献3

1王珏,胡长军,张纪林,李建江.面向分布式存储系统结构的OpenMP编译系统[J].中国科学：信息科学,2010,40(5):678-691. 被引量：2
2杜洋,侯英,赵文涛,赵军.激光推进数值模拟程序优化研究[J].计算机工程与科学,2009,31(A01):286-288.
3魏玲,郭新朋.基于并行处理机制的数据复用策略研究[J].计算机应用研究,2017,34(8):2324-2328. 被引量：2

二级引证文献4

1李建江,路川,张磊.基于指导语句的CUDA程序性能分析工具研究与实现[J].电子科技大学学报,2012,41(2):280-284. 被引量：1
2关兆雄.面向非结构化的分布式存储系统的性能分析系统研究[J].自动化与仪器仪表,2018,0(2):40-43. 被引量：12
3李春生,张勇,张可佳,宋佳.基于MAS的时序数据集成管理模型设计[J].计算机与数字工程,2018,46(5):928-932.
4吕国,肖瑞雪,白振荣,孟凡兴.大数据挖掘中的MapReduce并行聚类优化算法研究[J].现代电子技术,2019,42(11):161-164. 被引量：21

1吴礼发,谢立,孙钟秀.一种基于异构网络的NOW中的群通信模型[J].计算机研究与发展,1998,35(11):1042-1047. 被引量：1
2梅园.无线传感网络加密通信优化算法的研究与仿真[J].计算机仿真,2012,29(7):195-198. 被引量：5
3丁伟,顾冠群.基于异步通信机制的群通信服务[J].电信科学,1994,10(9):57-60.
4胡伟松,赵倩纯,吴百锋.BIM中碰撞检测的可扩展性算法设计[J].微型电脑应用,2015,31(5):5-8.
5吴礼发,谢立,孙钟秀.一种基于ATM的支持并行处理的高速通信机制[J].计算机学报,1998,21(7):586-594.
6DirkRoose,RafaelVanDriesche,邹辉,吴子牛,朱自强.并行计算机和计算流体力学并行算法[J].力学进展,1998,28(1):111-135. 被引量：8
7宋弘,伍乾永,胡莲君.基于Ad hoc的门限密钥更新研究[J].微电子学,2009,39(5):666-669.
8阮冬茹,谢东光.基于EPA的工业以太网确定性研究[J].微计算机信息,2006,22(03S):101-102. 被引量：6
9王源,李艳萍.自组网多约束蜂群路由算法研究与仿真[J].计算机应用与软件,2013,30(6):243-245.
10杜庆灵,吕述望.多发送认证码的有关边界与构造[J].计算机工程与应用,2004,40(10):9-10.

计算机学报

2008年第2期

浏览历史

内容加载中请稍等...

一种数据并行中的群通信优化策略被引量：3

参考文献1

二级参考文献12

共引文献1

同被引文献46

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种数据并行中的群通信优化策略 被引量：3

参考文献1

二级参考文献12

共引文献1

同被引文献46

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种数据并行中的群通信优化策略被引量：3