期刊文献+

基于多种同构化变换的SLP向量化方法

SLP Vectorization Method Based on Multiple Isomorphic Transformations
下载PDF
导出
摘要 超字级并行(superword level parallelism,SLP)是一种面向处理器单指令多数据(single instruction multiple data,SIMD)扩展部件实现程序自动向量化的方法,这种方法被广泛应用于主流编译器中.SLP方法有赖于先找到同构指令序列再对之进行自动向量化.将非同构指令序列等价转为同构指令序列以扩展SLP方法的适用范围是当前研究趋势之一.提出SLP的一种扩展方法──SLP-M向量化方法,引入二元表达式替换同构转换方式,基于条件判断和收益计算的选择,利用多种指令序列同构化转换,将满足特定条件的非同构指令序列转换为同构指令序列,再进一步实施自动向量化,从而提升SLP的适用范围和收益.在LLVM中实现了SLP-M方法,并利用SPEC CPU 2017等标准测试集进行了测试评估.实验结果表明,SLPM方法相比于已有方法在核心函数测试中性能提升了21.8%,在基准测试程序整体测试中性能提升了4.1%. SLP(superword level parallelism)is an efficient auto-vectorization method to exploit the data level parallelism for basic block,oriented to SIMD(single instruction multiple data),and SLP has been widely used in the mainstream compilers.SLP performs vectorization by finding multiple sequences of isomorphic instructions in the same basic block.Recently there is a research trend that the compilers translate the sequences of non-isomorphisic instructions into the sequences of isomorphisic instructions to extend application scope of the SLP vectorization method.In this paper,we introduce SLP-M,a novel auto-vectorization method that can effectively vectorize the code containing sequences of non-isomorphic instructions in the same basic block,translatting the code into isomorphic form by selection and conduction of multiple transformation methods based on condition judgment and benefit evaluation.A new transformation method for binary expression replacement is also proposed.SLP-M improves application scope and performance benefit for SLP.We implement SLP-M in LLVM.A set of applications are taken from some benchmarks such as SPEC CPU 2017 to compare our approach and prior techniques.The experiments show that,compared with the existing methods,the performance of SLP-M improves by 21.8%on kernel functions,and improves by 4.1%in the overall tests of the benchmarks.
作者 冯竞舸 贺也平 陶秋铭 马恒太 Feng Jingge;He Yeping;Tao Qiuming;Ma Hengtai(National Engineering Research Center for Fundamental Software(Institute of Software,Chinese Academy of Sciences),Beijing 100190;(State Key Laboratory of Computer Science(Institute of Software,Chinese Academy of Sciences),Beijing 100190))
出处 《计算机研究与发展》 EI CSCD 北大核心 2023年第12期2907-2927,共21页 Journal of Computer Research and Development
基金 中国科学院战略性先导科技专项(XDA-Y01-01,XDC02010600)。
关键词 SIMD扩展 自动向量化 超字级并行 非同构指令序列 同构化变换 SIMD extension auto-vectorization superword level parallelism(SLP) sequence of non-isomorphism instructions isomorphic transformation
  • 相关文献

参考文献9

二级参考文献43

  • 1Stewart J. An investigation of SIMD instruction sets. University of Ballarat School of Information Technology and Mathematical Sciences, 2005. http://noisymime.org/blogimages/SIMD.pdf.
  • 2Nuzman D, Rosen I, Zaks A. Auto-Vectorization of interleaved data for SIMD, In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation. Ottawa: ACM Press, 2006. 132-143. [doi: 10.1145/1133981.1133996].
  • 3Zheng WM, Tang ZZ. Compiler Archtecture. Beijing: Tsinghua University Press, 1998 (in Chinese).
  • 4Allen R, Kennedy K. Optimizing Compilers for Modern Architectures--A Dependence-Based Approach. San Francisco: Morgan Kaufmann Publishers, 2001.
  • 5Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press, 2000 (in Chinese).
  • 6Bik AJC. The Software Vectorization Handbook--Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
  • 7Hampton M, Asanovic K. Compiling for vector-thread architectures. In: Proc. of the 6th Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. Boston: ACM Press, 2008.205-215. [doi: 10.1145/1356058.1356085].
  • 8Naishlos D, Biberstein M, Ben-David S, Zaks A. Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Int'l ConL on Compilers, Architecture and Synthesis for Embedded Systems. San Jose: ACM Press, 2003.2-11. [doi: 10.1145/951710.951714].
  • 9Bik AJC, GirKar M, Grey PM, Tian XM. Automatic intra-register vectorization for the Intel architecture. Int'l Journal of Parallel Programming, 2002,30(2):65-98. [doi: 10.1023/A:1014230429447].
  • 10Wu P, Eichenberger AE, Wang A, Zhao P. An integrated simdization framework using virtual vectors. In: Proc. of the 19th Annual Int'l Conf. on Supercomputing. Cambridge: ACM Press, 2005. 169-178. [doi: 10.1145/1088149.1088172].

共引文献197

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部