期刊文献+

一种新型自动向量化编译算法

A New Algorithm for Auto-Vectorization Compilation
原文传递
导出
摘要 SIMD(single instruction multiple data)体系结构在高性能计算与嵌入式多媒体计算中扮演着重要的角色,对于SIMD指令的自动向量化编译技术是当前编译领域的研究热点.本文基于超字并行(super-word level parallelism,SLP)算法提出了一种新的自动向量化算法GSLP(global super-word level parallelism),该算法分为两部分:语句分组和语句调度.语句分组从全局出发分析超字复用信息,在语句分组的过程中,充分挖掘基本块的直接或者间接的超字复用信息,提高基本块内语句并行操作的机会;语句调度对基本块内的所有语句进行调度并调整超字内部单字(Single-word)数据的组织顺序,使生成的代码中打包/解包(pack/unpack)操作的数量降到最低.使用16个测试程序对GSLP算法进行测试,试验结果表明该算法使打包/解包操作的数量平均减少了41.6%,与SLP算法所产生的加速相比平均提高了4.7%. The SIMD(single instruction multiple data)architecture plays an important role in high performance and embedded multi-media computing.Auto-vectorization compilation for SIMD instruction is the current hot research topic in the field of compilation.This paper proposed a new auto-vectorization algorithm GLSP(global super-word level parallelism).Our algorithm mainly consists of two parts,statement grouping and statement scheduling.Statement grouping analyzes the reuse information of super-word from the global situation,makes full use of the opportunities on direct or indirect super-word reuse for basic blocks,and improves the opportunities on parallel operation of statements in a basic block.Statement scheduling reduces the number of packing and unpacking operation to minimum in generated code by scheduling all statements in a basic block and adjusts the organization order of single word in a super-word.A test including 16 test benches has been applied on GLSP algorithm.The experimental result showed that,compared with SLP algorithm,it has an average 41.6% reduction on pack/unpack operations,and an average 4.7%improvement on speed-up.
出处 《武汉大学学报(理学版)》 CAS CSCD 北大核心 2016年第5期456-463,共8页 Journal of Wuhan University:Natural Science Edition
基金 核高基重大专项(2014ZX01020-003) 国家自然科学基金项目资助(61136002) 国家863计划资助项目(2015AA7015028)
关键词 SIMD指令 编译技术 自动向量化 超字并行 超字复用 SIMD(single instruction multiple data)instruction compiling technique auto-vectorization SLP(su per-word level parallelism) super-word reuse
  • 相关文献

参考文献3

二级参考文献29

  • 1Stewart J. An investigation of SIMD instruction sets. University of Ballarat School of Information Technology and Mathematical Sciences, 2005. http://noisymime.org/blogimages/SIMD.pdf.
  • 2Nuzman D, Rosen I, Zaks A. Auto-Vectorization of interleaved data for SIMD, In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation. Ottawa: ACM Press, 2006. 132-143. [doi: 10.1145/1133981.1133996].
  • 3Zheng WM, Tang ZZ. Compiler Archtecture. Beijing: Tsinghua University Press, 1998 (in Chinese).
  • 4Allen R, Kennedy K. Optimizing Compilers for Modern Architectures--A Dependence-Based Approach. San Francisco: Morgan Kaufmann Publishers, 2001.
  • 5Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press, 2000 (in Chinese).
  • 6Bik AJC. The Software Vectorization Handbook--Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
  • 7Hampton M, Asanovic K. Compiling for vector-thread architectures. In: Proc. of the 6th Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. Boston: ACM Press, 2008.205-215. [doi: 10.1145/1356058.1356085].
  • 8Naishlos D, Biberstein M, Ben-David S, Zaks A. Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Int'l ConL on Compilers, Architecture and Synthesis for Embedded Systems. San Jose: ACM Press, 2003.2-11. [doi: 10.1145/951710.951714].
  • 9Bik AJC, GirKar M, Grey PM, Tian XM. Automatic intra-register vectorization for the Intel architecture. Int'l Journal of Parallel Programming, 2002,30(2):65-98. [doi: 10.1023/A:1014230429447].
  • 10Wu P, Eichenberger AE, Wang A, Zhao P. An integrated simdization framework using virtual vectors. In: Proc. of the 19th Annual Int'l Conf. on Supercomputing. Cambridge: ACM Press, 2005. 169-178. [doi: 10.1145/1088149.1088172].

共引文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部