GCC非满载SLP向量化

Insufficient SLP in GCC

下载PDF

导出

摘要随着向量长度的不断增长, SIMD扩展部件得以处理更为庞大的数据级并行,但程序的并行阈值也随之提高.对于现有的自动向量化编译器,如果在分析阶段不能从串行代码中发掘出足够的数据级并行以完全填充向量寄存器,则不会进入相应的向量代码变换阶段,从而无法向量化.较长的向量长度使得某些并行性不足的程序失去了向量化的机会,造成了性能下降.为了更加充分的利用SIMD部件,介绍了一种面向基本块的非满载向量化方法ISLP.基于开源GCC编译器,从并行性检测、代码生成和代价模型3个方面详细阐述了ISLP的设计与实现.在标准测试集上的实验结果表明,该方法可以有效地对超字级并行性不足的程序进行向量化处理,提高程序执行效率.选取的测试用例在向量化后的平均加速比达到1.14,性能较常规SLP方法提升11.8%. With the increase in vector length, SIMD extension can deal with more huge data level parallelism, but the parallelism threshold of the program also increases. For the current auto-vectorization compiler, if enough data level parallelism can not be found from the scalar code to completely fill the vector register in the analysis stage, it will not enter the vector code transformation stage, and vectorization cannot be achieved. The improvement of vector length makes some programs with insufficient parallelism lose the opportunity of vectorization, resulting in performance degradation. To make full use of SIMD components, this study introduces a basic block oriented insufficient vectorization method ISLP. Based on the GCC compiler, the design and implementation of ISLP are described in detail from three aspects: parallelism detection, code generation and cost model. Experiments on the standard test set show that this method can effectively vectorize the program with insufficient super-word level parallelism and improve the program execution efficiency. The average speedup ratio of the selected test cases after vectorization reaches 1.14, and the performance is11.8% higher than that of the conventional SLP method.

作者刘浩浩韩林崔平非 LIU Hao-Hao;HAN Lin;CUI Ping-Fei(Research Institute of Frontier Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)

机构地区中原工学院前沿信息技术研究院

出处《计算机系统应用》 2022年第9期265-271,共7页 Computer Systems & Applications

关键词 GCC SIMD扩展非满载向量化超字级并行性代码生成 SLP GNU compiler collection(GCC) SIMD extension insufficient vectorization superword level parallelism code generation SLP

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1高伟,赵荣彩,韩林,庞建民,丁锐.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284. 被引量：30
2辛乃军,陈旭灿,孙海燕,阳柳,罗杰,淡孝强,王霁.基于GCC的高性能DSP Matrix向量指令集扩展[J].计算机工程与科学,2012,34(1):58-63. 被引量：9
3徐金龙,赵荣彩,赵博.SIMD向量指令的非满载使用方法研究[J].计算机科学,2015,42(7):229-233. 被引量：3
4高伟,韩林,赵荣彩,徐金龙,陈超然.向量并行度指导的循环SIMD向量化方法[J].软件学报,2017,28(4):925-939. 被引量：5
5王琦,韩林,姚金阳,陶小涵.不充分SIMD向量化技术研究[J].计算机应用与软件,2018,35(9):108-112. 被引量：4

二级参考文献22

1Naishlos D.Autovectorization in GCC[C]∥Proc of GCC De-velopers’Summit,2003.
2Eichenberger A E,Wu Peng,O’Brien K.Vectorization for SIMD Architectures with Alignment Constraints[C]∥Proc of the ACM SIGPLAN’04,2004.
3Nuzman D,Henderson R.Multi-Platform Auto-Vectoriza-tion[C]∥Proc of the International Symposium on Code Gen-eration and Optimization,2006:26-29.
4Stallman R M.GCC Inter for Version4.4.5[M].The GCC Developer Community,GNV Press,2008.
5Peleg A, Weiser U. MMX Technology Extension to the Intel Architecture[J]. IEEE/ACM International Symposium on Mi- croarchitecture, 1996,16 (4) : 42-50.
6Intel Corporatior Intel 64 and 1A-32 Architectures Software Developer's Manual[EB/OL]. http://www, intel, corn/Assets/ PDF/manual/252046. pdf, 2011.
7Reinders J. AVX-512 instructions[EB/OL], https://software. intel, com/en-us/blogs/2013/avx-512-instructions, 2013.
8Reinders J. Additional AVX-512 instructions[EB/OL], https:// software, intel, com/en-us/blogs/additional-avx-512-instructions, 2014.
9Intel Corporation. IA32 Intel Architecture Software Developer's Manual, Volume I : Basic Architecture[M].Intel Press, 2004.
10SIMD [EB/OL]. http: Hen. wikipedia, org/wiki/SIMD. 2014.

共引文献39

1徐颖,李春江,董钰山,周思齐.GCC编译器中编译指导的自动向量化实现[J].计算机科学,2014,41(B11):364-367. 被引量：2
2高伟,赵荣彩,韩林,庞建民,丁锐.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284. 被引量：30
3徐金龙,赵荣彩,赵博.SIMD向量指令的非满载使用方法研究[J].计算机科学,2015,42(7):229-233. 被引量：3
4王浩,张叶.基于GCC的TMS320C67xx汇编指令的生成[J].计算机应用,2015,35(A01):206-209.
5唐佳,王凡,刘福烈.三维波动方程正演的三级并行加速[J].石油地球物理勘探,2016,51(5):1049-1054. 被引量：8
6贺婷.基于数据级自动向量化的编译优化研究综述[J].智能计算机与应用,2016,6(6):68-71. 被引量：1
7张素平,王冬,丁丽丽,王鹏翔,宫一,于海宁.一种基于SLP的新型编译框架[J].计算机应用研究,2017,34(1):21-26. 被引量：1
8张素平,韩林,丁丽丽,王鹏翔.新型超字级并行改进算法[J].计算机应用,2017,37(2):450-456.
9吕鹏伟,刘从新,沈绪榜.一种新型自动向量化编译算法[J].武汉大学学报（理学版）,2016,62(5):456-463.
10丁丽丽,韩林,王冬,张素平,王鹏翔,于海宁.依赖距离主导的向量化方法研究[J].计算机应用研究,2017,34(5):1311-1315.

1韩晓丽,李勇超,袁媛,王俊伟.医药物流的自动化立体仓库布局规划设计研究[J].现代工业经济和信息化,2022,12(8):48-50. 被引量：2
2宁帅峰,李杰.基于SLP法的配送中心设施布局规划[J].中国储运,2022(9):71-73. 被引量：2
3周命端,姬旭,李静,田野,林大伟.基于卫星共视法的GNSS授时比对试验与分析[J].北京建筑大学学报,2022,38(4):64-70. 被引量：2

计算机系统应用

2022年第9期

浏览历史

内容加载中请稍等...

GCC非满载SLP向量化

参考文献5

二级参考文献22

共引文献39

相关作者

相关机构

相关主题

浏览历史