期刊文献+

GCC非满载SLP向量化

Insufficient SLP in GCC
下载PDF
导出
摘要 随着向量长度的不断增长, SIMD扩展部件得以处理更为庞大的数据级并行,但程序的并行阈值也随之提高.对于现有的自动向量化编译器,如果在分析阶段不能从串行代码中发掘出足够的数据级并行以完全填充向量寄存器,则不会进入相应的向量代码变换阶段,从而无法向量化.较长的向量长度使得某些并行性不足的程序失去了向量化的机会,造成了性能下降.为了更加充分的利用SIMD部件,介绍了一种面向基本块的非满载向量化方法ISLP.基于开源GCC编译器,从并行性检测、代码生成和代价模型3个方面详细阐述了ISLP的设计与实现.在标准测试集上的实验结果表明,该方法可以有效地对超字级并行性不足的程序进行向量化处理,提高程序执行效率.选取的测试用例在向量化后的平均加速比达到1.14,性能较常规SLP方法提升11.8%. With the increase in vector length, SIMD extension can deal with more huge data level parallelism, but the parallelism threshold of the program also increases. For the current auto-vectorization compiler, if enough data level parallelism can not be found from the scalar code to completely fill the vector register in the analysis stage, it will not enter the vector code transformation stage, and vectorization cannot be achieved. The improvement of vector length makes some programs with insufficient parallelism lose the opportunity of vectorization, resulting in performance degradation. To make full use of SIMD components, this study introduces a basic block oriented insufficient vectorization method ISLP. Based on the GCC compiler, the design and implementation of ISLP are described in detail from three aspects: parallelism detection, code generation and cost model. Experiments on the standard test set show that this method can effectively vectorize the program with insufficient super-word level parallelism and improve the program execution efficiency. The average speedup ratio of the selected test cases after vectorization reaches 1.14, and the performance is11.8% higher than that of the conventional SLP method.
作者 刘浩浩 韩林 崔平非 LIU Hao-Hao;HAN Lin;CUI Ping-Fei(Research Institute of Frontier Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)
出处 《计算机系统应用》 2022年第9期265-271,共7页 Computer Systems & Applications
关键词 GCC SIMD扩展 非满载向量化 超字级并行性 代码生成 SLP GNU compiler collection(GCC) SIMD extension insufficient vectorization superword level parallelism code generation SLP
  • 相关文献

参考文献5

二级参考文献22

  • 1Naishlos D.Autovectorization in GCC[C]∥Proc of GCC De-velopers’Summit,2003.
  • 2Eichenberger A E,Wu Peng,O’Brien K.Vectorization for SIMD Architectures with Alignment Constraints[C]∥Proc of the ACM SIGPLAN’04,2004.
  • 3Nuzman D,Henderson R.Multi-Platform Auto-Vectoriza-tion[C]∥Proc of the International Symposium on Code Gen-eration and Optimization,2006:26-29.
  • 4Stallman R M.GCC Inter for Version4.4.5[M].The GCC Developer Community,GNV Press,2008.
  • 5Peleg A, Weiser U. MMX Technology Extension to the Intel Architecture[J]. IEEE/ACM International Symposium on Mi- croarchitecture, 1996,16 (4) : 42-50.
  • 6Intel Corporatior Intel 64 and 1A-32 Architectures Software Developer's Manual[EB/OL]. http://www, intel, corn/Assets/ PDF/manual/252046. pdf, 2011.
  • 7Reinders J. AVX-512 instructions[EB/OL], https://software. intel, com/en-us/blogs/2013/avx-512-instructions, 2013.
  • 8Reinders J. Additional AVX-512 instructions[EB/OL], https:// software, intel, com/en-us/blogs/additional-avx-512-instructions, 2014.
  • 9Intel Corporation. IA32 Intel Architecture Software Developer's Manual, Volume I : Basic Architecture[M].Intel Press, 2004.
  • 10SIMD [EB/OL]. http: Hen. wikipedia, org/wiki/SIMD. 2014.

共引文献39

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部