期刊文献+

基于跨基本块变换和循环分布的SLP优化技术

SLP Optimization Algorithm Using Across Basic Block Transformation and Loop Distribution
下载PDF
导出
摘要 现有的SLP优化算法无法处理内层循环中存在的依赖环和归约,并且在基本块边界产生大量的冗余拆包和赋值语句,从而导致向量化效率不高。针对该问题,提出了一种基于跨基本块变换和循环分布的SLP优化算法。该算法以控制流图为基础,根据基本块间各数组变量的Define-Use关系以及跨越基本块之间的数据依赖关系进行跨基本块的向量化变换,有序地采用跨基本块变换和循环分布,尽可能发掘最内层循环基本块内语句的并行性,使SLP自动向量化编译器生成具有更多SIMD指令的向量化代码。实验结果表明,该算法能够隐藏更多跨基本块冗余操作的开销,同时利用跨基本块的数据依赖生成更优的SIMD指令,有效地提高了向量化程序的加速比。 The existing SLP algorithms cannot handle dependent ring and the reduction of the inner loop, and generate a large number of redundant packet disassembly and assignment statements in a basic block boundary, which leads to the lower quantization efficiency. In order to solve the problem, this paper proposed a SLP optimization algorithm using cross basic block transformation and loop distribution. Based on the control flow graph, according to the basic blocks of the array variable between Define-Use and across basic block data relation between across basic block, the algorithm makes the quantized transform, orderly uses across basic block transform and loop distribution, and then expands inner loop within a basic block sentence parallelism as far as possible, making SLP automatic vectorization compiler to genera te the vectorization code which has more SIMD instruction. The experimental results show that the algorithm can hide more across basic block redundancy operation cost, at the same time generate better SIMD instructions across basic block data dependence, effectively improving the vectorization program speedup.
出处 《计算机科学》 CSCD 北大核心 2013年第10期24-28,60,共6页 Computer Science
基金 核高基重大专项(2009ZX01036-001-001-2)资助
关键词 SLP 跨基本块变换 循环分布 数据依赖 控制流图 Define-Use关系 SLP, Cross basic block, Loop distribution, Data dependence, Control flow graph, Define-Use relationship
  • 相关文献

参考文献12

  • 1Franchetti F, Kral S, Lorenz J, et al. Efficiem utilization of SIMD extensions[J]. Proceedings ofthe IEEE, 2005,93(2) : 409-425.
  • 2TMS320C6000 CPU and Instruction Set Reference Guide(Rev. F)[M]. TexasInstruments Inc. 2000.
  • 3SC140 DSP Core Reference Manual[R/OL]. http://cache, free- scale, com/files/dsp/doc/ref_ manual/MNSC140CA:)RE, pdf, 2012-05-20.
  • 4Fridman J, Greenfield Z. The Tiger SHARC DSP Architecture [J]. IEEE Micro, 2000,20(1 ) : 66-76.
  • 5Tanaka H, Ota Y, Matsumoto N, et al. A New Compilation Technique for SIMD Code Generation Across Basic Block Boundaries[C] // Design Automation Conference ( ASP-DAC), 2010 15th Asia and South Pacifi. Jan. 2010:101-106.
  • 6Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets[C]//Proc of the ACM SIGP- LAiN Conference on Programming Language Design and Imple- mentation. June 2000:145-156.
  • 7Shin J, Hall M, Charne J. Superword-level Parallelism in the Presence of Control Flow[C]//Proc. of the International Sym- posium on Code Generation and optimization. March 2005:165- 175.
  • 8Nuzman D, Zaks A. Outer-loop vectorization: revisited for short simd architectures[C]//Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ' 08. New York, NY, USA, ACM, 2008: 2-11.
  • 9Aho A V,Lam M S,Sethi R,et al.编译原理[M].陈火旺,刘春林,谭庆平,等,译.北京: 机械工业出版社,2009.
  • 10陈火旺,刘春林.程序设计语言编译原理(第3版)[M].北京:国防工业出版社,2001.

共引文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部