期刊文献+

基于循环分块的流水粒度优化算法 被引量:1

Pipelining granularity optimization algorithm based on loop tiling
下载PDF
导出
摘要 当计算划分层迭代数目较大,或是循环体单次迭代工作量较大,但可用的并行线程数目较小时,传统的基于循环分块的流水粒度优化方法无法进行处理。为此,提出一种基于循环分块减小流水粒度的方法,并根据流水并行循环的代价模型实现最优流水粒度的求解,设计实现了一个流水计算粒度的优化算法。对有限差分松弛法(FDR)的波前循环和时域有限差分法(FDTD)中典型循环的测试表明,与传统的流水粒度选择方法相比,所提算法能够得到更优的循环分块大小。 When the pipelining loop has a great number of iterations,or the size of its body is large,but the number of available threads is small,the workload between two synchronizations of a thread is so heavy,which produces pretty low degree of parallelism.The traditional trade-off approach based on loop tiling cannot handle the above situation.To solve this problem,a pipelining granularity decreasing approach based on loop tiling was proposed.The optimal pipelining granularity was obtained by building the cost model for pipelining loop and a pipelining granularity optimizing algorithm was implemented.By measuring the wavefront loops of Finite Difference Relaxation(FDR) and the representative loops of Finite Difference Time Domain(FDTD),the loops show better performance improvement by using the proposed algorithm than the traditional one.
出处 《计算机应用》 CSCD 北大核心 2013年第8期2171-2176,共6页 journal of Computer Applications
基金 "核高基"国家科技重大专项(2009ZX01036-001-001-2)
关键词 自动并行化 流水并行 流水粒度 循环分块 代价模型 automatic parallelization pipelining parallelization pipelining granularity loop tiling cost model
  • 相关文献

参考文献12

  • 1BENOIT A, MELHEM R, RENAUD-GOUD P, et al. Power-aware Manhattan routing on chip muhiprocessors [ C]// Proceedings of 2012 IEEE 26th International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2012:189-200.
  • 2JIN H Q, JESPEREN D, MEHROTRA P, et al. High performance computing using MPI and OpenMP on multi-core parallel systems [ J]. Parallel Computing, 2011, 37(9) : 562 - 575.
  • 3BONDHUGULA U K R. Effective automatic parallelization and lo- cality optimization using the polyhedral model [ D], Ohio: The Ohio State University, 2008.
  • 4AKHTER S, ROBERTS J. Multi-core programming: increasing per- formance through software multi-threading [ M]. Hillsboro: Intel Corporation, 2006:13-27.
  • 5CYTRON R. Doacross: beyond vectorization for multiprocessors[ C]// Proceedings of the 1986 International Conference on Parallel Processing. Piscataway: IEEE, 1986: 836-844.
  • 6CHEN D-K, YEW P-C. An empirical study on DOACROSS loops [C]// Proceedings of Supercomputing. New York: ACM, 1991: 620 - 632.
  • 7HURSON A R, LIM J T, KAVI K M, et al. Parallelization of DO- ALL and DOACROSS loops - a survey [ J]. Advances in Comput- ers, 1997, 45:53-103.
  • 8LIN Y-T, WANG S-C, SHIH W-L, et al. Enable OpenCL compiler with Open64 infrastructures [ C]// 2011 IEEE 13th International Conference on High Performance Computing and Communications.Piscataway: IEEE, 2011 : 863 - 868.
  • 9富弘毅,丁滟,宋伟,杨学军.一种利用并行复算实现的OpenMP容错机制[J].软件学报,2012,23(2):411-427. 被引量:7
  • 10THOMAN P, JORDAN H, PELLEGRINI S, et al. Automatic OpenMP loop scheduling: a combined compiler and runtime ap- proach [ C]// IWOMP'12: Proceedings of 8th International Confer- ence on OpenMP in a Heterogeneous World. Berlin: Springer-Ver- lag, 2012:88 - 101.

二级参考文献11

  • 1TOP500 supercomputing site. http://www.top500.org.
  • 2Reed DA, Lu CD, Mendes CL. Reliability challenges in large systems. Future Generation Computer Systems, 2006,22(3):293-302. [doi: 10.1016/j.future.2004.11.015].
  • 3Sorin DJ, Martin MMK, Hill MD, Wood DA. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In: Proc. of the lnt'l Syrup. on Computer Architecture (ISCA 2002). Anchorage, 2002. 123-134. [doi: 10.1109/ISCA.2002.1003568].
  • 4Prvulovic M, Zhang Z, Torrellas J. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: Proc. of the Int'l Symp. on Computer Architecture (ISCA 2002). Anchorage, 2002. 111-122. Idol: 10.1109/ ISCA.2002.1003567].
  • 5Dieter WR, Lumpp JE. A user-level eheekpointing library for POSIX threads programs. In: Proe. of the '99 Syrup. on Fault-Tolerant Computing Systems (FTCS'99). Madison, 1999. 224-227. [doi: 10.1109/FTCS.1999.781054].
  • 6Bronevetsky G, Marques D, Pingali K, Szwed P, Schulz M. Application-Level cheekpointing for shared memory programs. In: Proc. of the 1 lth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2004). New York, 2004.235-247. [doi: 10.1145/1024393.1024421].
  • 7Bronevetsky G, Pingali K, Stodghill P. Experimental evaluation of applicationlevel cheekpointing for OpenMP programs. In: Proc. of the 20th Annual Int'l Conf. on Supercomputing (SC 2006). Cairns, 2006.2-13. [doi: 10.1145/1183401.1183405].
  • 8Bronevetsky G, Marques D, Pingali K, Stodghill P. Ca: A system for automating application-level ch~ckpointing of MPI programs. In: Proc. of the 16th Int'l Workshop on Languages and Compilers for Parallel Computing (LCPC 2003). 2003.
  • 9Yang XJ, Du YF, Wang PF, Fu HY, Jia J, Wang ZY, Suo G. The fault tolerant parallel algorithm: The parallel recomputing based failure recovery. In: Proc. of the 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT 2007). Brasov, 2007. 199-212. Idol: 10.1109/PACT.2007.4336212].
  • 10Bailey DH, Harris T, Saphir W, Wijngaart RVD, Woo A, Yarrow M. The NAS parallel benchmarks 2.0. Technical Report, NAS- 95-020, NASA Ames Research Center, 1995.

共引文献6

同被引文献6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部