期刊文献+

一类非规则并行应用问题的通信集生成算法 被引量:2

Communication Set Generation for a Special Case of Irregular Parallel Applications
下载PDF
导出
摘要 非规则计算是大规模并行应用中普遍存在和影响效率的关键问题.在基于分布式内存的数据并行范例中,如何针对非规则数组引用,有效地生成本地内存访问序列和通信集,是并行编译生成SPMD结点程序所必须解决的重要问题.文中针对两重嵌套循环中,下一层循环边界是上一层循环变量的线性或非线性函数,数组下标是两层循环变量的非线性函数这样一类包含非规则数组引用的并行应用问题,提出了一种在编译时生成通信集的代数算法.并且针对cyclic(k)数据分布和线性对齐模板,借助整数格概念,给出了编译时全局地址和本地地址之间的转换方法.文中还给出了相应的经过通信优化的SPMD结点程序.最后通过实例验证了算法的正确性.该算法的意义在于避免了传统Inspector/Executor非规则计算模型中的Inspector阶段,从而节省了运行时Inspector阶段通过穷举下标生成通信集的巨大开销. Irregular computing significantly influences the performance of large scale parallel applications. How to generate local memory access sequence and communication set efficiently for irregular array reference is an important issue in compiling a data-parallel language into a single program multiple data (SPMD) code for distributed-memory machines. By far, many researches have focused on the problem of communication set generation under regular array references in parallel loops. However, little researches give attentions to generating communication set for irregular array accesses in loop nests. In general case of irregular accesses, Inspector/Executor model is adopted to scan over array elements at Inspector phase of run time such that communication set can be constructed. This paper proposes an approach to derive an algebraic solution of communication set enumeration at compile time for the situation of irregular array references in nested loops. And this paper introduces integer lattice into alignment and cyclic(k)-distribution for global-to-local and local-to-global index translations. Then it also presents the algorithm for the corresponding SPMD code generation. When the parameters of alignments and cyclic(k) and array references are known, the SPMD code can then be completely derived at compile time such that no inspector-like run-time code will be needed to construct enumeration of the communication set.
出处 《计算机学报》 EI CSCD 北大核心 2008年第1期120-126,共7页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划项目基金(2006AA01Z105) 国家自然科学基金(60373008) 教育部重点基金(106019)资助
关键词 非规则计算 通信集生成 并行编译 通信优化 SPMD irregular computing communication set generation parallelizing compilers communication optimization SPMD scheme
  • 相关文献

参考文献17

  • 1Koelbel C. Compile-time generation of regular communica tions patterns//Proceedings of the Conference on High Performance Networking and Computing, Albuguerque. New Mexico, United States, 1991:101-110
  • 2Gupta S K S, Kaushik S D, Huang C H, Sadayappan P. Compiling array expressions for efficient execution on distributed memory machines. Journal of Parallel and Distributed Computing, 1996, 32(2): 155-172
  • 3Chatterjee S, Gilbert J R, Long F J E, Schreiber R, Teng S H. Generating local addresses and communication sets for data-parallel programs. Journal of Parallel and Distributed Computing, 1995, 26(1): 72-84
  • 4Adve V, Mellor-Crummey J. Using integer sets for data-parallel program analysis and optimization//Proceedings of the Conference on Programming Language Design and Implementation. Montreal, Quebec, CA, 1998:186-198
  • 5Kennedy K, Nedeljkovic N, Sethi A. Efficient address generation for block-cyclic distributions//Proceedings of the In ternational Conference on Supercomputing. Barcelona, Spain, 1995:180-184
  • 6Kennedy K, Nedeljkovic N, Sethi A. A linear-time algorithm for computing the memory access sequence in data-parallel programs//Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. Santa Barbara, Santa Barbara, California, United States, 1995: 102-111
  • 7Tseng E H Y, Gaudiot J L. Communication generation for aligned and Cyclic (k) distributions using integer lattice. IEEE Transactions on Parallel Distributed Systems, 1999, 10(2) : 136-146
  • 8Lee P Z, Chen W Y. Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers. Parallel Computing, 2002, 28(9) : 1329-1368
  • 9Huang T C, Shiu L C. Efficient communication sets generation for block-cyclic distribution on distributed-memory machines. Journal of Systems Architecture, 2003, 48:255-265
  • 10Hwang G H. An efficient algorithm for communication set generation of data parallel programs with block-cyclic distribution. Parallel Computing, 2004, 30(4) : 473-501

同被引文献39

  • 1Ancourt C, Irigoin F. Scanning polyhedra with do loops[C]// Proceedings of the third ACM SIGPLAN Symposium on Princi- ples and Practice of Parallel Programming (PPOPP' 91 ), 1991. NY, USA; ACM New York, 1991 : 39-50.
  • 2Amarasinghe S P, Lain M S. Communication optimization and code generation for distributed memory machines[C]//Procee- dings of The ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ' 93 ), 1993. NY, USA: ACM New York, 1993 ; 126-138.
  • 3Ferner C S. The Paraguin Compiler-Message-Passing Code Gen- eration Using SUIF [C]//Proceedings IEEE SoutheastCon, 2002. Washington IS)C: IEEE Computer Society, 2002 : 1-6.
  • 4Ferner C S. Revisiting Communication Code Generation Algo- rithrns for Message-passing Systems[J]. International Journal of Parallel, Emergent and Distributed Systems, 2006,21 (5) : 323- 344.
  • 5Bondhugula U, Hartono A, Ramanujam J, et al. A practical auto- matic polyhedral parallelizer and locality optimizer[C] Pro- ceedings of The ACM SIGPLAN 2008 Conference on Program- ruing Language Design and Implementation (PLDI' 08), 2008. NY, USA: ACM New York, 2008 :101-113.
  • 6Bondhugula U. Automatic distributed-memory parallelization and code generation using the polyhedral framework[R]. IISc- CSA-TR-2011-3. Bangalore: Indian Institute of Science, 2011.
  • 7Ponnusamy R, Saltz J, Choudhary A. Runtime-eompilation tech- niques for data partitioning and communication schedule reuse [C] // Proceedings of the 1993 ACM/IEEE Conference on Su- percomputing, 1993. NY, USA. ACM New York, 1993: 361-370.
  • 8Mukherjee S S, Sharma S D, Hill M D, et al. Efficient support for irregular applications on distributed-memory machines[C]// Proceedings of the Fifth ACM SIGPLAN Symposium on Princi- ples and Practice of Parallel Programming 1995. NY, USA: ACM New York, 1995 : 68-79.
  • 9Guo Min-yi, Li Li, Chang Weng-long Efficient loop partitioning for parallel codes of irregular scientific computations[J]. IEICE Transactions on Information and Systems, 2003, 86 (9) : 1825- 1834.
  • 10Ravishankar M, Eisenlohr J, Pouehet LN, et al. Code generation for parallel execution of a class of irregular loops on distributed memory systems[C] The Intemational Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), 2012.

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部