摘要
非规则计算是大规模并行应用中普遍存在和影响效率的关键问题.在基于分布式内存的数据并行范例中,如何针对非规则数组引用,有效地生成本地内存访问序列和通信集,是并行编译生成SPMD结点程序所必须解决的重要问题.文中针对两重嵌套循环中,下一层循环边界是上一层循环变量的线性或非线性函数,数组下标是两层循环变量的非线性函数这样一类包含非规则数组引用的并行应用问题,提出了一种在编译时生成通信集的代数算法.并且针对cyclic(k)数据分布和线性对齐模板,借助整数格概念,给出了编译时全局地址和本地地址之间的转换方法.文中还给出了相应的经过通信优化的SPMD结点程序.最后通过实例验证了算法的正确性.该算法的意义在于避免了传统Inspector/Executor非规则计算模型中的Inspector阶段,从而节省了运行时Inspector阶段通过穷举下标生成通信集的巨大开销.
Irregular computing significantly influences the performance of large scale parallel applications. How to generate local memory access sequence and communication set efficiently for irregular array reference is an important issue in compiling a data-parallel language into a single program multiple data (SPMD) code for distributed-memory machines. By far, many researches have focused on the problem of communication set generation under regular array references in parallel loops. However, little researches give attentions to generating communication set for irregular array accesses in loop nests. In general case of irregular accesses, Inspector/Executor model is adopted to scan over array elements at Inspector phase of run time such that communication set can be constructed. This paper proposes an approach to derive an algebraic solution of communication set enumeration at compile time for the situation of irregular array references in nested loops. And this paper introduces integer lattice into alignment and cyclic(k)-distribution for global-to-local and local-to-global index translations. Then it also presents the algorithm for the corresponding SPMD code generation. When the parameters of alignments and cyclic(k) and array references are known, the SPMD code can then be completely derived at compile time such that no inspector-like run-time code will be needed to construct enumeration of the communication set.
出处
《计算机学报》
EI
CSCD
北大核心
2008年第1期120-126,共7页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2006AA01Z105)
国家自然科学基金(60373008)
教育部重点基金(106019)资助
关键词
非规则计算
通信集生成
并行编译
通信优化
SPMD
irregular computing
communication set generation
parallelizing compilers
communication optimization
SPMD scheme