期刊文献+

一种基于代表元的划分算法 被引量:4

An Affine Partition Algorithm Based on Representative Element
下载PDF
导出
摘要 划分是把程序中不同的计算和数据分配到并行处理系统的不同处理机来充分利用并行系统的计算资源、提高程序处理速度的一种优化技术.划分的效果对程序在并行系统上的执行效率将产生至关重要的影响,因此划分问题一直是并行领域研究的一个热点.但是应用程序的一些特性,如非紧密嵌套循环、一条语句对非只读数组的多次引用间存在重叠、不同语句对同一数组不同步长的引用,给有效解决划分问题设置了极大的障碍.已有的划分算法无法对具有这些特征的程序进行自动划分.虽然在对具有这些特征的程序进行手工优化过程中,存在一些直观上的划分策略,但这些策略无法应用到编译器中来指导编译器完成对程序的自动划分.文中根据这类程序的特点,提出了一种基于代表元的划分算法.该算法通过使用程序中对划分计算产生实际影响的数组引用作为代表元素构造各种划分的限制条件,完成程序的划分.同时通过寻找最大一致性数据划分方向有效减少了程序划分过程中的数据重组织通信.该算法已经在AFT2004中实现,并对应用程序获得了很好的效果. Partition is an optimization technique that distributes computations and data onto the different processors of parallel systems to get the maximizing parallelism and minimizing communication. The effect of partition algorithm can directly affect the performance of parallel systems. But there are many obstacles to effective partition in practical programs, such as imperfect loop nests and different array access scope. Previous partition algorithms can only finish the partition of sequences of perfect loop nests or cannot solve data partition consistent problem for different array access scope. This paper presents an affine partition algorithm based on representative element. When constructing the constraint relation for partition, it only remains the array references, which have contributes to partition constraint relation indeed and remove the trivial partition conflicts through discarding redundant array references to same array. This paper also presents a consistent partition algorithm to solve the data partition consistent problem, The algorithms can not only solve more practical partition problems, but also effectively reduce data reorganization communication in data partition. This technique has been implemented in AFT2004 parallel compiling system and can get better results for some practical programs.
出处 《计算机学报》 EI CSCD 北大核心 2008年第3期400-410,共11页 Chinese Journal of Computers
基金 国家博士点基金(20050246020)资助
关键词 计算划分 数据划分 代表元 非紧密嵌套循环 数据划分一致性 并行编译 affine partition data partition representative element imperfect loop parallel cornpiling
  • 相关文献

参考文献20

  • 1Anderson J M, Lam M S. Global optimizations for parallelism and locality on scalable parallel machines//Proceedings of the ACM SIGPLANr 93 Conference on Programming Language Design and Implementation. Albuquerque, New Mexico, USA, 1993:112-125
  • 2Banerjee U. Unimodular transformations of double loops// Proceedings of the 3rd Workshop on Languages and Compilers for Parallel Computing. Irvine, California, USA, 1990: 192-219
  • 3Wolf M E, I.am M S. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 1991, 2(4): 452-470
  • 4Wolf M E. Improving locality and parallelism in nested loops [Ph. D. dissertation]. Stanford University, CSL_ TR-92- 538, 1992
  • 5Anderson J M, Amarasinghe S P, Lam M S. Data and computation transformations for multiprocessors//Proceedings of the 5th ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming. Santa Barbara, California, USA, 1995:166-178
  • 6Anderson J. Automatic computation and data decomposition for multiprocessors[Ph. D. dissertation]. Stanford University, Standford, CA, 1997
  • 7Bau D, Kidukula Iet al. Solving alignment using elementary linear algebra//Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing. Ithaca, NY, USA, 1994:46-60
  • 8Ramanujam J, Sadayappan P. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 1991, 2 (4) : 472-482
  • 9Huang C H, Sadayappan P. Communication-free hyperplane partitioning of nested loops. Journal of Parallel and Distribu ted Computing, 1993, 19(2): 90-102
  • 10Shih K-P, Sheu J P, Huang C H. Statement level communication free partitioning techniques for parallelizing compilers//Proceedings of the 9th Workshop on Languages and Compilers for Parallel Computing. Minneapolis, Minnesota, USA, 1997:389-403

同被引文献38

  • 1夏军,杨学军.基于数据空间融合的全局计算与数据划分方法[J].软件学报,2004,15(9):1311-1327. 被引量:7
  • 2王轶然,陈莉,冯晓兵,张兆庆.全局部分重复计算划分[J].计算机研究与发展,2006,43(12):2158-2165. 被引量:2
  • 3S F McGinn, R E Shaw. Parallel Gaussian Elimination Using OpenMP and MPI [C]// Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications (HPCS'02). Moncton, NB, Canada, 2002: 169-176.
  • 4Jeff Howe, Shawn Bratcher, Parallel Guassian Elimination [DB/OL]. (2006-9)[2008-1] http://www.cse.ucsd.edu/classes/ fa98/cse164b/ Proj ects/PastProj ects/LU/.
  • 5Wei Bing Lu, Tie Jun Cui, Zhi Guo Qian, et al. Accurate Analysis of Large-Scale Periodic Structures Using an Efficient Sub-Entire- Domain Basis Function Method [J]. IEEE Transactions on Antennas and Propagation (S0018-926x), 2005, 53(1): 3078-3085.
  • 6Luis M Camacho, Saibun Tjuatja. FDTD Simulation of Microwave Scattering from a Lung Tumor [C]// Antennas and Propagation Society International Symposium, 2005 IEEE, 3-8 July 2005. USA: IEEE, 2005, 3A: 815-818.
  • 7Anderson J M. Automatic computation and data decomposition for multiproeessors [D]. US: Stanford University, 1997.
  • 8Kennedy K,Kremer U. Automatic Data Layout for Distributed- Memory Machines [J]. ACM Transactions on Programming Languages and Systems, 1998,20 (4) : 869-916.
  • 9韩林.面向分布存储结构的并行分饵一致性优化技术研究[D].郑州:解放军信息工程大学,2008.
  • 10Lee Pei-zong. Automatic data and computation decomposition on distributed memory parallel eomputers[J]. ACM Transactions on Programming Languages and Systems, 2002,24 (1) : 1-50.

引证文献4

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部