全局部分重复计算划分被引量：2

Global Partial Replicate Computation Partitioning

下载PDF

导出

摘要并行化编译器常常采用拥有者计算规则来进行计算划分,为了提高性能和可扩展性,后来引入了部分重复计算划分的概念·这是一种针对并行程序节点间局部性的重要优化方法·以前的部分重复计算划分局限于一个循环套的范围,因此新提出了全局部分重复计算划分的问题,给出一个简化的性能模型和一个基于整数线性规划的全局部分重复计算划分框架·实验结果表明,其结果显著优于局限于单个循环套的部分重复计算划分,比以前提出的启发式方法有更好的适应性· Early parallelizing compilers use the owner-computes rule to partition computation. Partial replication is then introduced to reduce near-neighbor communication at the cost of some repeated computation. It is an important optimization that improves the performance and scalability of parallel programs. Former exploration of partial replicate computation partitioning is limited within a＇single loop nest, and no explicit cost model is used. In this paper, a formal description of more general partial replicate computation partitioning problems is presented, which is called global partial replicate computation partitioning. As redundant message elimination exerts great influence on the effect of such optimizations, a linear cost model is introduced, which considers its effect. A framework is also developed, which employs the integer linear programming method. Experimental results show that the solution is superior to local approaches. Compared with the heuristic method, the new approach can deal with more general cases and is easier to adapt to different data distribution.

作者王轶然陈莉冯晓兵张兆庆

机构地区中国科学院计算技术研究所系统结构重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2006年第12期2158-2165,共8页 Journal of Computer Research and Development

基金国家"八六三"高技术研究发展计划基金项目(2004AA1Z2200) 中国科学院计算技术研究所知识创新科研项目(20056260)~~

关键词并行化编译器分布式主存系统部分重复计算划分数据并行 parallelizing compiler distributed memory systems partial replicate computation partitioning data parallel

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献11

1Vikram Adve,Guohua Jin,John Mellor-Crummey,et al.Design and evaluation of a computation partitioning framework for data-parallel compilers[R].Department of Computer Science,Rice University,Tech Rep:CS-TR01-382,2001
2J Mellor-Crummey,V Adve,B Broom,et al.Advanced optimization strategies in the rice dHPF compiler[J].Concurrency and Computation:Practice and Experience,2002,14(8-9):741-767
3Li Chen,Zhaoqing Zhang,Xiaobing Feng.Redundant computation partition on distributed-memory systems[C].In:Proc of the 5th Int'l Conf on Algorithms and Architectures for Parallel Processing.Los Alamitos,CA:IEEE Computer Society Press,2002
4Feng Xiaobing.Global automatic data distribution:[Ph D dissertation][D].Beijing:Institute of Computing Technology,Chinese Academy of Sciences,1999
5Chen Li.Optimization of parallel codes on SMP clusters:[Ph D dissertation][D].Beijing:Institute of Computing Technology,Chinese Academy of Sciences,2002
6Manish Gupta,Edith Schonberg,Harini Srinivasan.A unified data-flow framework for optimizing communication[C].The 7th Workshop on Languages and Compilers for Parallel Computing,Ithaca,NY,USA,1994
7Yiran Wang,Li Chen,Zhaoqing Zhang.Global partial replicate computation partitioning[C].The 33rd Int'l Conf on Parallel Processing,Montreal,Quebec,Canada,2004
8J Bruno,P Cappello.Implementing the beam and warming method on the hypercube[C].The 3rd Conference on Hypercube Concurrent Computers and Applications,Pasadena,CA,1988
9S Lennart Johnsson,Youcef Saad,Martin H Schultz.Alternating direction methods on multiprocessors[J].SIAM Journal of Scientific Computing,1987,8(5):686-700
10N H Naik,V Naik,M Nicoules.Parallelization of a class of implicit finite difference schemes in computational fluid dynamics[J].International Journal of High Speed Computing,1993,5(1):1-50

同被引文献27

1夏军,杨学军.基于数据空间融合的全局计算与数据划分方法[J].软件学报,2004,15(9):1311-1327. 被引量：7
2尤洪涛,姜小成,陈左宁.基于动态任务划分的降级机制[J].微计算机信息,2006,22(10X):72-75. 被引量：9
3中山大学数学力学系.概率论及数理统计[M].北京：高等教育出版社,1985..
4陈莉,霍玮,卢兴敬,唐生林.多核/众核系统上的并行编程语言[J].信息技术快报,2012,10( 1 ) :23-24.
5Streitz F H, Glosli J N, Patel M V, et al. 100+TFlop solidification simulations on BlueGene]L [EB/OL]. 1-2014-11- 021. http://sc05, supercomp, org/schedule/pdf[pap307, pd{.
6Koziar C, Reilein R, Runger G. Load imbalance aspects in atmosphere simulations [J]. International Journal of Computational Science and Engineering, 2005, 1 (2): 215- 225.
7Kale L V. CHARM--+: A portable concurrent object oriented system based on C-t--}- [C] [/Proc o{ OOPSLA 1993. New York: ACM, 1993:91-108.
8Menon H, Kal6 L. A distributed dynamic load balancer for iterative applications [C] //Proc of IEEE/ACM SC13. New York: ACM, 2013:1-11.
9Zheng G, Meneses E, Bhatele A, et al. Hierarchical load balancing for Charm++applications on large supercomputers [C] //Proc of the 39th Int Con{ on Parallel Processing Workshops. Los Alamitos, CA: IEEE Computer Society, 2010:436-444.
10LBNL, UC Berkeley. Berkeley UPC-Unified Parallel C [EB/OL]. [2014-11-02]. http://upc, lbl. gov.

引证文献2

1丁锐,赵荣彩,韩林.一种基于数组生命期的数据分解算法[J].软件学报,2013,24(12):2843-2858.
2何王全,魏迪,权建校,吴伟,漆锋滨.基于排队理论的动态任务调度模型及容错[J].计算机研究与发展,2016,53(6):1271-1280. 被引量：1

二级引证文献1

1王铁滨.排队理论在舰船光纤局域网信号传输的应用[J].舰船科学技术,2018,40(10X):127-129.

1顾兆军,张友海.并行程序控制流图可视化的实现[J].中国民航学院学报,2000,18(1):43-48.
2顾兆军.一种开放式软件研制环境的实现[J].中国民航学院学报,1999,17(2):13-17.
3Red Hat欲推OpenShift平台转攻PaaS[J].硅谷,2011(19):86-86.
4侯永生,赵荣彩,张平,韩枫.并行化编译器中基于工作量的条件并行化研究[J].微计算机信息,2005,21(4):220-221. 被引量：6
5陈文光,杨博,王紫瑶,郑丰宙,郑纬民.一个交互式的Fortran77并行化系统[J].软件学报,1999,10(12):1259-1267. 被引量：8
6于勐,陈贵海,阳雪林,谢立,过敏意.JAVA并行化编译器JAPS-Ⅱ[J].软件学报,2002,13(4):739-747. 被引量：5
7顾兆军.HPF程序编译信息的超文本生成器[J].航空计算技术,1999,29(1):38-42.
8唐启锋,许蕾,钱巨,陈林,张震宇.一种基于模型结合的错误定位方法[J].中国科技论文,2012,7(1):19-27. 被引量：6
9胡世亮,臧斌宇,朱传琪.用计算函数模型增强数据流分析[J].软件学报,2000,11(2):187-194. 被引量：1
10任华,赵荣彩,张平,孟然.基于精确数组数据流分析框架的通信优化[J].计算机工程与应用,2006,42(36):130-133.

计算机研究与发展

2006年第12期

浏览历史

内容加载中请稍等...

全局部分重复计算划分被引量：2

参考文献11

同被引文献27

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

全局部分重复计算划分 被引量：2

参考文献11

同被引文献27

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

全局部分重复计算划分被引量：2