摘要
随着向量长度的不断增长, SIMD扩展部件得以处理更为庞大的数据级并行,但程序的并行阈值也随之提高.对于现有的自动向量化编译器,如果在分析阶段不能从串行代码中发掘出足够的数据级并行以完全填充向量寄存器,则不会进入相应的向量代码变换阶段,从而无法向量化.较长的向量长度使得某些并行性不足的程序失去了向量化的机会,造成了性能下降.为了更加充分的利用SIMD部件,介绍了一种面向基本块的非满载向量化方法ISLP.基于开源GCC编译器,从并行性检测、代码生成和代价模型3个方面详细阐述了ISLP的设计与实现.在标准测试集上的实验结果表明,该方法可以有效地对超字级并行性不足的程序进行向量化处理,提高程序执行效率.选取的测试用例在向量化后的平均加速比达到1.14,性能较常规SLP方法提升11.8%.
With the increase in vector length, SIMD extension can deal with more huge data level parallelism, but the parallelism threshold of the program also increases. For the current auto-vectorization compiler, if enough data level parallelism can not be found from the scalar code to completely fill the vector register in the analysis stage, it will not enter the vector code transformation stage, and vectorization cannot be achieved. The improvement of vector length makes some programs with insufficient parallelism lose the opportunity of vectorization, resulting in performance degradation. To make full use of SIMD components, this study introduces a basic block oriented insufficient vectorization method ISLP. Based on the GCC compiler, the design and implementation of ISLP are described in detail from three aspects: parallelism detection, code generation and cost model. Experiments on the standard test set show that this method can effectively vectorize the program with insufficient super-word level parallelism and improve the program execution efficiency. The average speedup ratio of the selected test cases after vectorization reaches 1.14, and the performance is11.8% higher than that of the conventional SLP method.
作者
刘浩浩
韩林
崔平非
LIU Hao-Hao;HAN Lin;CUI Ping-Fei(Research Institute of Frontier Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)
出处
《计算机系统应用》
2022年第9期265-271,共7页
Computer Systems & Applications
关键词
GCC
SIMD扩展
非满载向量化
超字级并行性
代码生成
SLP
GNU compiler collection(GCC)
SIMD extension
insufficient vectorization
superword level parallelism
code generation
SLP