期刊文献+

全局部分重复计算划分 被引量:2

Global Partial Replicate Computation Partitioning
下载PDF
导出
摘要 并行化编译器常常采用拥有者计算规则来进行计算划分,为了提高性能和可扩展性,后来引入了部分重复计算划分的概念·这是一种针对并行程序节点间局部性的重要优化方法·以前的部分重复计算划分局限于一个循环套的范围,因此新提出了全局部分重复计算划分的问题,给出一个简化的性能模型和一个基于整数线性规划的全局部分重复计算划分框架·实验结果表明,其结果显著优于局限于单个循环套的部分重复计算划分,比以前提出的启发式方法有更好的适应性· Early parallelizing compilers use the owner-computes rule to partition computation. Partial replication is then introduced to reduce near-neighbor communication at the cost of some repeated computation. It is an important optimization that improves the performance and scalability of parallel programs. Former exploration of partial replicate computation partitioning is limited within a'single loop nest, and no explicit cost model is used. In this paper, a formal description of more general partial replicate computation partitioning problems is presented, which is called global partial replicate computation partitioning. As redundant message elimination exerts great influence on the effect of such optimizations, a linear cost model is introduced, which considers its effect. A framework is also developed, which employs the integer linear programming method. Experimental results show that the solution is superior to local approaches. Compared with the heuristic method, the new approach can deal with more general cases and is easier to adapt to different data distribution.
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第12期2158-2165,共8页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2004AA1Z2200) 中国科学院计算技术研究所知识创新科研项目(20056260)~~
关键词 并行化编译器 分布式主存系统 部分重复计算划分 数据并行 parallelizing compiler distributed memory systems partial replicate computation partitioning data parallel
  • 相关文献

参考文献11

  • 1Vikram Adve,Guohua Jin,John Mellor-Crummey,et al.Design and evaluation of a computation partitioning framework for data-parallel compilers[R].Department of Computer Science,Rice University,Tech Rep:CS-TR01-382,2001
  • 2J Mellor-Crummey,V Adve,B Broom,et al.Advanced optimization strategies in the rice dHPF compiler[J].Concurrency and Computation:Practice and Experience,2002,14(8-9):741-767
  • 3Li Chen,Zhaoqing Zhang,Xiaobing Feng.Redundant computation partition on distributed-memory systems[C].In:Proc of the 5th Int'l Conf on Algorithms and Architectures for Parallel Processing.Los Alamitos,CA:IEEE Computer Society Press,2002
  • 4Feng Xiaobing.Global automatic data distribution:[Ph D dissertation][D].Beijing:Institute of Computing Technology,Chinese Academy of Sciences,1999
  • 5Chen Li.Optimization of parallel codes on SMP clusters:[Ph D dissertation][D].Beijing:Institute of Computing Technology,Chinese Academy of Sciences,2002
  • 6Manish Gupta,Edith Schonberg,Harini Srinivasan.A unified data-flow framework for optimizing communication[C].The 7th Workshop on Languages and Compilers for Parallel Computing,Ithaca,NY,USA,1994
  • 7Yiran Wang,Li Chen,Zhaoqing Zhang.Global partial replicate computation partitioning[C].The 33rd Int'l Conf on Parallel Processing,Montreal,Quebec,Canada,2004
  • 8J Bruno,P Cappello.Implementing the beam and warming method on the hypercube[C].The 3rd Conference on Hypercube Concurrent Computers and Applications,Pasadena,CA,1988
  • 9S Lennart Johnsson,Youcef Saad,Martin H Schultz.Alternating direction methods on multiprocessors[J].SIAM Journal of Scientific Computing,1987,8(5):686-700
  • 10N H Naik,V Naik,M Nicoules.Parallelization of a class of implicit finite difference schemes in computational fluid dynamics[J].International Journal of High Speed Computing,1993,5(1):1-50

同被引文献27

  • 1夏军,杨学军.基于数据空间融合的全局计算与数据划分方法[J].软件学报,2004,15(9):1311-1327. 被引量:7
  • 2尤洪涛,姜小成,陈左宁.基于动态任务划分的降级机制[J].微计算机信息,2006,22(10X):72-75. 被引量:9
  • 3中山大学数学力学系.概率论及数理统计[M].北京:高等教育出版社,1985..
  • 4陈莉,霍玮,卢兴敬,唐生林.多核/众核系统上的并行编程语言[J].信息技术快报,2012,10( 1 ) :23-24.
  • 5Streitz F H, Glosli J N, Patel M V, et al. 100+TFlop solidification simulations on BlueGene]L [EB/OL]. 1-2014-11- 021. http://sc05, supercomp, org/schedule/pdf[pap307, pd{.
  • 6Koziar C, Reilein R, Runger G. Load imbalance aspects in atmosphere simulations [J]. International Journal of Computational Science and Engineering, 2005, 1 (2): 215- 225.
  • 7Kale L V. CHARM--+: A portable concurrent object oriented system based on C-t--}- [C] [/Proc o{ OOPSLA 1993. New York: ACM, 1993:91-108.
  • 8Menon H, Kal6 L. A distributed dynamic load balancer for iterative applications [C] //Proc of IEEE/ACM SC13. New York: ACM, 2013:1-11.
  • 9Zheng G, Meneses E, Bhatele A, et al. Hierarchical load balancing for Charm++applications on large supercomputers [C] //Proc of the 39th Int Con{ on Parallel Processing Workshops. Los Alamitos, CA: IEEE Computer Society, 2010:436-444.
  • 10LBNL, UC Berkeley. Berkeley UPC-Unified Parallel C [EB/OL]. [2014-11-02]. http://upc, lbl. gov.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部