一类非规则并行应用问题的通信集生成算法被引量：2

Communication Set Generation for a Special Case of Irregular Parallel Applications

下载PDF

导出

摘要非规则计算是大规模并行应用中普遍存在和影响效率的关键问题.在基于分布式内存的数据并行范例中,如何针对非规则数组引用,有效地生成本地内存访问序列和通信集,是并行编译生成SPMD结点程序所必须解决的重要问题.文中针对两重嵌套循环中,下一层循环边界是上一层循环变量的线性或非线性函数,数组下标是两层循环变量的非线性函数这样一类包含非规则数组引用的并行应用问题,提出了一种在编译时生成通信集的代数算法.并且针对cyclic(k)数据分布和线性对齐模板,借助整数格概念,给出了编译时全局地址和本地地址之间的转换方法.文中还给出了相应的经过通信优化的SPMD结点程序.最后通过实例验证了算法的正确性.该算法的意义在于避免了传统Inspector/Executor非规则计算模型中的Inspector阶段,从而节省了运行时Inspector阶段通过穷举下标生成通信集的巨大开销. Irregular computing significantly influences the performance of large scale parallel applications. How to generate local memory access sequence and communication set efficiently for irregular array reference is an important issue in compiling a data-parallel language into a single program multiple data （SPMD） code for distributed-memory machines. By far, many researches have focused on the problem of communication set generation under regular array references in parallel loops. However, little researches give attentions to generating communication set for irregular array accesses in loop nests. In general case of irregular accesses, Inspector/Executor model is adopted to scan over array elements at Inspector phase of run time such that communication set can be constructed. This paper proposes an approach to derive an algebraic solution of communication set enumeration at compile time for the situation of irregular array references in nested loops. And this paper introduces integer lattice into alignment and cyclic（k）-distribution for global-to-local and local-to-global index translations. Then it also presents the algorithm for the corresponding SPMD code generation. When the parameters of alignments and cyclic（k） and array references are known, the SPMD code can then be completely derived at compile time such that no inspector-like run-time code will be needed to construct enumeration of the communication set.

作者胡长军李静王珏姚广利李永红丁良李建江

机构地区北京科技大学信息工程学院

出处《计算机学报》 EI CSCD 北大核心 2008年第1期120-126,共7页 Chinese Journal of Computers

基金国家"八六三"高技术研究发展计划项目基金(2006AA01Z105) 国家自然科学基金(60373008) 教育部重点基金(106019)资助

关键词非规则计算通信集生成并行编译通信优化 SPMD irregular computing communication set generation parallelizing compilers communication optimization SPMD scheme

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1Koelbel C. Compile-time generation of regular communica tions patterns//Proceedings of the Conference on High Performance Networking and Computing, Albuguerque. New Mexico, United States, 1991:101-110
2Gupta S K S, Kaushik S D, Huang C H, Sadayappan P. Compiling array expressions for efficient execution on distributed memory machines. Journal of Parallel and Distributed Computing, 1996, 32(2): 155-172
3Chatterjee S, Gilbert J R, Long F J E, Schreiber R, Teng S H. Generating local addresses and communication sets for data-parallel programs. Journal of Parallel and Distributed Computing, 1995, 26(1): 72-84
4Adve V, Mellor-Crummey J. Using integer sets for data-parallel program analysis and optimization//Proceedings of the Conference on Programming Language Design and Implementation. Montreal, Quebec, CA, 1998:186-198
5Kennedy K, Nedeljkovic N, Sethi A. Efficient address generation for block-cyclic distributions//Proceedings of the In ternational Conference on Supercomputing. Barcelona, Spain, 1995:180-184
6Kennedy K, Nedeljkovic N, Sethi A. A linear-time algorithm for computing the memory access sequence in data-parallel programs//Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. Santa Barbara, Santa Barbara, California, United States, 1995: 102-111
7Tseng E H Y, Gaudiot J L. Communication generation for aligned and Cyclic (k) distributions using integer lattice. IEEE Transactions on Parallel Distributed Systems, 1999, 10(2) : 136-146
8Lee P Z, Chen W Y. Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers. Parallel Computing, 2002, 28(9) : 1329-1368
9Huang T C, Shiu L C. Efficient communication sets generation for block-cyclic distribution on distributed-memory machines. Journal of Systems Architecture, 2003, 48:255-265
10Hwang G H. An efficient algorithm for communication set generation of data parallel programs with block-cyclic distribution. Parallel Computing, 2004, 30(4) : 473-501

同被引文献39

1Ancourt C, Irigoin F. Scanning polyhedra with do loops[C]// Proceedings of the third ACM SIGPLAN Symposium on Princi- ples and Practice of Parallel Programming (PPOPP' 91 ), 1991. NY, USA; ACM New York, 1991 : 39-50.
2Amarasinghe S P, Lain M S. Communication optimization and code generation for distributed memory machines[C]//Procee- dings of The ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ' 93 ), 1993. NY, USA: ACM New York, 1993 ; 126-138.
3Ferner C S. The Paraguin Compiler-Message-Passing Code Gen- eration Using SUIF [C]//Proceedings IEEE SoutheastCon, 2002. Washington IS)C: IEEE Computer Society, 2002 : 1-6.
4Ferner C S. Revisiting Communication Code Generation Algo- rithrns for Message-passing Systems[J]. International Journal of Parallel, Emergent and Distributed Systems, 2006,21 (5) : 323- 344.
5Bondhugula U, Hartono A, Ramanujam J, et al. A practical auto- matic polyhedral parallelizer and locality optimizer[C] Pro- ceedings of The ACM SIGPLAN 2008 Conference on Program- ruing Language Design and Implementation (PLDI' 08), 2008. NY, USA: ACM New York, 2008 :101-113.
6Bondhugula U. Automatic distributed-memory parallelization and code generation using the polyhedral framework[R]. IISc- CSA-TR-2011-3. Bangalore: Indian Institute of Science, 2011.
7Ponnusamy R, Saltz J, Choudhary A. Runtime-eompilation tech- niques for data partitioning and communication schedule reuse [C] // Proceedings of the 1993 ACM/IEEE Conference on Su- percomputing, 1993. NY, USA. ACM New York, 1993: 361-370.
8Mukherjee S S, Sharma S D, Hill M D, et al. Efficient support for irregular applications on distributed-memory machines[C]// Proceedings of the Fifth ACM SIGPLAN Symposium on Princi- ples and Practice of Parallel Programming 1995. NY, USA: ACM New York, 1995 : 68-79.
9Guo Min-yi, Li Li, Chang Weng-long Efficient loop partitioning for parallel codes of irregular scientific computations[J]. IEICE Transactions on Information and Systems, 2003, 86 (9) : 1825- 1834.
10Ravishankar M, Eisenlohr J, Pouehet LN, et al. Code generation for parallel execution of a class of irregular loops on distributed memory systems[C] The Intemational Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), 2012.

引证文献2

1丁锐,赵荣彩,徐金龙,傅立国.自动并行化中不规则循环的代码生成[J].计算机科学,2013,40(12):9-14.
2傅立国,姚远,丁锐.自动并行化中不规则循环的通信代码生成[J].计算机应用,2014,34(4):1014-1018.

1王丽宏,王超,方滨兴,胡铭曾.关于不规则计算的研究[J].计算机工程与应用,1999,35(7):16-19.
2白金平.元搜索引擎并行式数据查询代理模块的设计与实现[J].计算机光盘软件与应用,2014,17(4):258-258.
3黄天柏.数据库编程中循环结构的使用[J].常州教育学院学报（综合版）,2000,18(1):65-66.
4李岩,何永祥,胡任仲.模糊控制器的一种快速算法[J].哈尔滨理工大学学报,1998,3(1):60-62.
5谢卫,刘键,李胜利.一个有效的数组下标相关性测试方法[J].计算机工程与应用,1993,29(4):17-20. 被引量：1
6赵天玉.双重映射式快速排序算法[J].计算机与现代化,1997(1):42-45.
7王秋芬,邵艳玲.一种新的基于哈希函数的排序算法[J].计算机与现代化,2010(10):47-49. 被引量：2
8赵正平.C语言中的指针与程序效率的探讨[J].淮北煤师院学报（自然科学版）,2003,24(4):48-52. 被引量：2
9董建华.SQL Server数据库的性能优化技术研究[J].科技广场,2007(5):93-94. 被引量：3
10毕明新,都淑萍.关于Sybase在应用系统中的运行效率研究[J].软件导刊,2007,6(3):15-16.

计算机学报

2008年第1期

浏览历史

内容加载中请稍等...

一类非规则并行应用问题的通信集生成算法被引量：2

参考文献17

同被引文献39

引证文献2

相关作者

相关机构

相关主题

浏览历史

一类非规则并行应用问题的通信集生成算法 被引量：2

参考文献17

同被引文献39

引证文献2

相关作者

相关机构

相关主题

浏览历史

一类非规则并行应用问题的通信集生成算法被引量：2