摘要
并行化编译器常常采用拥有者计算规则来进行计算划分,为了提高性能和可扩展性,后来引入了部分重复计算划分的概念·这是一种针对并行程序节点间局部性的重要优化方法·以前的部分重复计算划分局限于一个循环套的范围,因此新提出了全局部分重复计算划分的问题,给出一个简化的性能模型和一个基于整数线性规划的全局部分重复计算划分框架·实验结果表明,其结果显著优于局限于单个循环套的部分重复计算划分,比以前提出的启发式方法有更好的适应性·
Early parallelizing compilers use the owner-computes rule to partition computation. Partial replication is then introduced to reduce near-neighbor communication at the cost of some repeated computation. It is an important optimization that improves the performance and scalability of parallel programs. Former exploration of partial replicate computation partitioning is limited within a'single loop nest, and no explicit cost model is used. In this paper, a formal description of more general partial replicate computation partitioning problems is presented, which is called global partial replicate computation partitioning. As redundant message elimination exerts great influence on the effect of such optimizations, a linear cost model is introduced, which considers its effect. A framework is also developed, which employs the integer linear programming method. Experimental results show that the solution is superior to local approaches. Compared with the heuristic method, the new approach can deal with more general cases and is easier to adapt to different data distribution.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2006年第12期2158-2165,共8页
Journal of Computer Research and Development
基金
国家"八六三"高技术研究发展计划基金项目(2004AA1Z2200)
中国科学院计算技术研究所知识创新科研项目(20056260)~~
关键词
并行化编译器
分布式主存系统
部分重复计算划分
数据并行
parallelizing compiler
distributed memory systems
partial replicate computation partitioning
data parallel