基于循环分块的流水粒度优化算法被引量：1

Pipelining granularity optimization algorithm based on loop tiling

下载PDF

导出

摘要当计算划分层迭代数目较大,或是循环体单次迭代工作量较大,但可用的并行线程数目较小时,传统的基于循环分块的流水粒度优化方法无法进行处理。为此,提出一种基于循环分块减小流水粒度的方法,并根据流水并行循环的代价模型实现最优流水粒度的求解,设计实现了一个流水计算粒度的优化算法。对有限差分松弛法(FDR)的波前循环和时域有限差分法(FDTD)中典型循环的测试表明,与传统的流水粒度选择方法相比,所提算法能够得到更优的循环分块大小。 When the pipelining loop has a great number of iterations,or the size of its body is large,but the number of available threads is small,the workload between two synchronizations of a thread is so heavy,which produces pretty low degree of parallelism.The traditional trade-off approach based on loop tiling cannot handle the above situation.To solve this problem,a pipelining granularity decreasing approach based on loop tiling was proposed.The optimal pipelining granularity was obtained by building the cost model for pipelining loop and a pipelining granularity optimizing algorithm was implemented.By measuring the wavefront loops of Finite Difference Relaxation（FDR） and the representative loops of Finite Difference Time Domain（FDTD）,the loops show better performance improvement by using the proposed algorithm than the traditional one.

作者刘晓娴赵荣彩丁锐李雁冰

机构地区信息工程大学数学工程与先进计算国家重点实验室

出处《计算机应用》 CSCD 北大核心 2013年第8期2171-2176,共6页 journal of Computer Applications

基金 "核高基"国家科技重大专项(2009ZX01036-001-001-2)

关键词自动并行化流水并行流水粒度循环分块代价模型 automatic parallelization pipelining parallelization pipelining granularity loop tiling cost model

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献12

1BENOIT A, MELHEM R, RENAUD-GOUD P, et al. Power-aware Manhattan routing on chip muhiprocessors [ C]// Proceedings of 2012 IEEE 26th International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2012:189-200.
2JIN H Q, JESPEREN D, MEHROTRA P, et al. High performance computing using MPI and OpenMP on multi-core parallel systems [ J]. Parallel Computing, 2011, 37(9) : 562 - 575.
3BONDHUGULA U K R. Effective automatic parallelization and lo- cality optimization using the polyhedral model [ D], Ohio: The Ohio State University, 2008.
4AKHTER S, ROBERTS J. Multi-core programming: increasing per- formance through software multi-threading [ M]. Hillsboro: Intel Corporation, 2006:13-27.
5CYTRON R. Doacross: beyond vectorization for multiprocessors[ C]// Proceedings of the 1986 International Conference on Parallel Processing. Piscataway: IEEE, 1986: 836-844.
6CHEN D-K, YEW P-C. An empirical study on DOACROSS loops [C]// Proceedings of Supercomputing. New York: ACM, 1991: 620 - 632.
7HURSON A R, LIM J T, KAVI K M, et al. Parallelization of DO- ALL and DOACROSS loops - a survey [ J]. Advances in Comput- ers, 1997, 45:53-103.
8LIN Y-T, WANG S-C, SHIH W-L, et al. Enable OpenCL compiler with Open64 infrastructures [ C]// 2011 IEEE 13th International Conference on High Performance Computing and Communications.Piscataway: IEEE, 2011 : 863 - 868.
9富弘毅,丁滟,宋伟,杨学军.一种利用并行复算实现的OpenMP容错机制[J].软件学报,2012,23(2):411-427. 被引量：7
10THOMAN P, JORDAN H, PELLEGRINI S, et al. Automatic OpenMP loop scheduling: a combined compiler and runtime ap- proach [ C]// IWOMP'12: Proceedings of 8th International Confer- ence on OpenMP in a Heterogeneous World. Berlin: Springer-Ver- lag, 2012:88 - 101.

二级参考文献11

1TOP500 supercomputing site. http://www.top500.org.
2Reed DA, Lu CD, Mendes CL. Reliability challenges in large systems. Future Generation Computer Systems, 2006,22(3):293-302. [doi: 10.1016/j.future.2004.11.015].
3Sorin DJ, Martin MMK, Hill MD, Wood DA. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In: Proc. of the lnt'l Syrup. on Computer Architecture (ISCA 2002). Anchorage, 2002. 123-134. [doi: 10.1109/ISCA.2002.1003568].
4Prvulovic M, Zhang Z, Torrellas J. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: Proc. of the Int'l Symp. on Computer Architecture (ISCA 2002). Anchorage, 2002. 111-122. Idol: 10.1109/ ISCA.2002.1003567].
5Dieter WR, Lumpp JE. A user-level eheekpointing library for POSIX threads programs. In: Proe. of the '99 Syrup. on Fault-Tolerant Computing Systems (FTCS'99). Madison, 1999. 224-227. [doi: 10.1109/FTCS.1999.781054].
6Bronevetsky G, Marques D, Pingali K, Szwed P, Schulz M. Application-Level cheekpointing for shared memory programs. In: Proc. of the 1 lth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2004). New York, 2004.235-247. [doi: 10.1145/1024393.1024421].
7Bronevetsky G, Pingali K, Stodghill P. Experimental evaluation of applicationlevel cheekpointing for OpenMP programs. In: Proc. of the 20th Annual Int'l Conf. on Supercomputing (SC 2006). Cairns, 2006.2-13. [doi: 10.1145/1183401.1183405].
8Bronevetsky G, Marques D, Pingali K, Stodghill P. Ca: A system for automating application-level ch~ckpointing of MPI programs. In: Proc. of the 16th Int'l Workshop on Languages and Compilers for Parallel Computing (LCPC 2003). 2003.
9Yang XJ, Du YF, Wang PF, Fu HY, Jia J, Wang ZY, Suo G. The fault tolerant parallel algorithm: The parallel recomputing based failure recovery. In: Proc. of the 16th Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT 2007). Brasov, 2007. 199-212. Idol: 10.1109/PACT.2007.4336212].
10Bailey DH, Harris T, Saphir W, Wijngaart RVD, Woo A, Yarrow M. The NAS parallel benchmarks 2.0. Technical Report, NAS- 95-020, NASA Ames Research Center, 1995.

共引文献6

1汪建军,于策,孙济洲,孙超,金舟,成钢.P2P多线程动态容错模型的研究与应用[J].计算机工程,2013,39(9):104-108.
2刘晓娴,赵荣彩,丁锐.面向DSWP并行的OpenMP任务调度机制的扩展与实现[J].计算机科学,2013,40(9):38-43. 被引量：2
3袁功彪,杨金民,白树仁.基于并发性发掘的低开销回卷恢复实现方法[J].计算机工程,2013,39(11):46-51.
4曾喜良,彭浩.容错机制的异构分布式系统安全可靠调度研究[J].网络安全技术与应用,2015(7):61-62.
5刘洋,杨金民.OpenMP程序中基于活跃变量分析的检查点优化[J].计算机工程与应用,2016,52(4):31-41.
6葛优,金大海,宫云战.基于OpenMP的并行Fortran程序数据竞争静态检测方法[J].小型微型计算机系统,2023,44(11):2377-2383.

同被引文献6

1沈昌祥,张焕国,王怀民,王戟,赵波,严飞,余发江,张立强,徐明迪.可信计算的研究与发展[J].中国科学：信息科学,2010,40(2):139-166. 被引量：251
2刘孜文,冯登国.基于可信计算的动态完整性度量架构[J].电子与信息学报,2010,32(4):875-879. 被引量：46
3冯登国,秦宇,汪丹,初晓博.可信计算技术研究[J].计算机研究与发展,2011,48(8):1332-1349. 被引量：114
4陈伟,杜凌霞,陈红.多核架构下的数据处理算法优化策略综述[J].计算机科学与探索,2011,5(12):1057-1075. 被引量：7
5杨蓓,吴振强,符湘萍.基于可信计算的动态完整性度量模型[J].计算机工程,2012,38(2):78-81. 被引量：17
6邓锐,陈左宁.基于策略嵌入和可信计算的完整性主动动态度量架构[J].计算机应用研究,2013,30(1):261-264. 被引量：6

引证文献1

1范超,赵荣彩,单征,王银浩.一种可配置策略的软件动态完整性度量架构[J].信息工程大学学报,2017,18(1):93-97. 被引量：1

二级引证文献1

1李红娇,王晓飞.面向用电采集系统终端的可信度量方案[J].上海电力学院学报,2019,35(3):236-241. 被引量：1

1康绯,刘胜利,武东英.循环分块在MPI程序设计中的应用研究[J].计算机工程与应用,2003,39(5):92-95. 被引量：1
2吴英杰,王一蕾.通过数组分块技术优化Cache性能[J].福建电脑,2006,22(1):27-27.
3李雁冰,赵荣彩,赵博,黄品丰.面向异构多核处理器的的循环分块[J].计算机工程与设计,2015,36(1):168-173. 被引量：4
4刘晓娴,赵荣彩,赵捷,徐金龙.面向规则DOACROSS循环的流水并行代码自动生成[J].软件学报,2014,25(6):1154-1168. 被引量：3
5胡莹.采用改进PSO的LU循环分块优化算法[J].河南师范大学学报（自然科学版）,2013,41(5):157-160.
6史岳鹏,周溪召,孔素真.基于优化PSO的LU循环分块方法[J].科学技术与工程,2013,21(20):5960-5963.
7张庆花,赵荣彩,李朋远.一种面向规则DOACROSS循环的自动并行化框架[J].小型微型计算机系统,2016,37(6):1365-1370.
8马琳,陈莉,冯晓兵.基于动态profiling技术的流水粒度调优[J].计算机研究与发展,2005,42(6):1065-1072. 被引量：2
9王寅峰,邓果丽,许志良.MIC商用并行编程性能优化分析[J].深圳信息职业技术学院学报,2013,11(1):87-93.
10王振宇,王义和,郭福顺.并行循环的识别[J].哈尔滨工业大学学报,1992,24(1):40-46.

计算机应用

2013年第8期

浏览历史

内容加载中请稍等...

基于循环分块的流水粒度优化算法被引量：1

参考文献12

二级参考文献11

共引文献6

同被引文献6

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于循环分块的流水粒度优化算法 被引量：1

参考文献12

二级参考文献11

共引文献6

同被引文献6

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于循环分块的流水粒度优化算法被引量：1