一种新型自动向量化编译算法

A New Algorithm for Auto-Vectorization Compilation

导出

摘要 SIMD(single instruction multiple data)体系结构在高性能计算与嵌入式多媒体计算中扮演着重要的角色,对于SIMD指令的自动向量化编译技术是当前编译领域的研究热点.本文基于超字并行(super-word level parallelism,SLP)算法提出了一种新的自动向量化算法GSLP(global super-word level parallelism),该算法分为两部分:语句分组和语句调度.语句分组从全局出发分析超字复用信息,在语句分组的过程中,充分挖掘基本块的直接或者间接的超字复用信息,提高基本块内语句并行操作的机会;语句调度对基本块内的所有语句进行调度并调整超字内部单字(Single-word)数据的组织顺序,使生成的代码中打包/解包(pack/unpack)操作的数量降到最低.使用16个测试程序对GSLP算法进行测试,试验结果表明该算法使打包/解包操作的数量平均减少了41.6%,与SLP算法所产生的加速相比平均提高了4.7%. The SIMD（single instruction multiple data）architecture plays an important role in high performance and embedded multi-media computing.Auto-vectorization compilation for SIMD instruction is the current hot research topic in the field of compilation.This paper proposed a new auto-vectorization algorithm GLSP（global super-word level parallelism）.Our algorithm mainly consists of two parts,statement grouping and statement scheduling.Statement grouping analyzes the reuse information of super-word from the global situation,makes full use of the opportunities on direct or indirect super-word reuse for basic blocks,and improves the opportunities on parallel operation of statements in a basic block.Statement scheduling reduces the number of packing and unpacking operation to minimum in generated code by scheduling all statements in a basic block and adjusts the organization order of single word in a super-word.A test including 16 test benches has been applied on GLSP algorithm.The experimental result showed that,compared with SLP algorithm,it has an average 41.6% reduction on pack/unpack operations,and an average 4.7%improvement on speed-up.

作者吕鹏伟刘从新沈绪榜

机构地区西安微电子技术研究所

出处《武汉大学学报（理学版）》 CAS CSCD 北大核心 2016年第5期456-463,共8页 Journal of Wuhan University:Natural Science Edition

基金核高基重大专项(2014ZX01020-003) 国家自然科学基金项目资助(61136002) 国家863计划资助项目(2015AA7015028)

关键词 SIMD指令编译技术自动向量化超字并行超字复用 SIMD（single instruction multiple data）instruction compiling technique auto-vectorization SLP（su per-word level parallelism） super-word reuse

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献3

1高伟,赵荣彩,韩林,庞建民,丁锐.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284. 被引量：30
2徐金龙,赵荣彩,韩林.分段约束的超字并行向量发掘路径优化算法[J].计算机应用,2015,35(4):950-955. 被引量：11
3魏帅,赵荣彩,姚远.面向SLP的多重循环向量化[J].软件学报,2012,23(7):1717-1728. 被引量：13

二级参考文献29

1Stewart J. An investigation of SIMD instruction sets. University of Ballarat School of Information Technology and Mathematical Sciences, 2005. http://noisymime.org/blogimages/SIMD.pdf.
2Nuzman D, Rosen I, Zaks A. Auto-Vectorization of interleaved data for SIMD, In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation. Ottawa: ACM Press, 2006. 132-143. [doi: 10.1145/1133981.1133996].
3Zheng WM, Tang ZZ. Compiler Archtecture. Beijing: Tsinghua University Press, 1998 (in Chinese).
4Allen R, Kennedy K. Optimizing Compilers for Modern Architectures--A Dependence-Based Approach. San Francisco: Morgan Kaufmann Publishers, 2001.
5Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press, 2000 (in Chinese).
6Bik AJC. The Software Vectorization Handbook--Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
7Hampton M, Asanovic K. Compiling for vector-thread architectures. In: Proc. of the 6th Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. Boston: ACM Press, 2008.205-215. [doi: 10.1145/1356058.1356085].
8Naishlos D, Biberstein M, Ben-David S, Zaks A. Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Int'l ConL on Compilers, Architecture and Synthesis for Embedded Systems. San Jose: ACM Press, 2003.2-11. [doi: 10.1145/951710.951714].
9Bik AJC, GirKar M, Grey PM, Tian XM. Automatic intra-register vectorization for the Intel architecture. Int'l Journal of Parallel Programming, 2002,30(2):65-98. [doi: 10.1023/A:1014230429447].
10Wu P, Eichenberger AE, Wang A, Zhao P. An integrated simdization framework using virtual vectors. In: Proc. of the 19th Annual Int'l Conf. on Supercomputing. Cambridge: ACM Press, 2005. 169-178. [doi: 10.1145/1088149.1088172].

共引文献43

1侯永生,赵荣彩,高伟,高伟.非正规化循环的单指令多数据向量化[J].计算机应用,2013,33(11):3149-3154. 被引量：1
2赵博,赵荣彩,李雁冰,高伟.类型转换语句的SLP发掘方法[J].计算机科学,2014,41(11):16-21. 被引量：2
3赵博,赵荣彩,徐金龙,高伟.渐进式智能回溯向量化代码调优方法[J].计算机科学,2015,42(1):50-53.
4王向前,洪一,郑启龙.分块内存的数据分布优化[J].小型微型计算机系统,2015,36(4):815-819. 被引量：1
5沈凤仙,孙勤红.嵌入式路由冲突下链路分离语义检索优化模型[J].微电子学与计算机,2015,32(7):142-146. 被引量：1
6王向前,洪一,王昊,郑启龙.魂芯DSP的编译器设计与优化[J].电子学报,2015,43(8):1656-1661. 被引量：8
7杨秀荣.并行数据库中异常数据优化分类挖掘方法研究[J].微电子学与计算机,2015,32(10):125-128. 被引量：4
8李晓东,魏惠茹.支持多模推荐的多层数据库优化访问技术[J].科技通报,2015,31(12):110-112. 被引量：2
9林荫,朱莹莹.基于小波包分解正态谐振数据库优化访问控制[J].科技通报,2015,31(12):113-114. 被引量：2
10唐佳,王凡,刘福烈.三维波动方程正演的三级并行加速[J].石油地球物理勘探,2016,51(5):1049-1054. 被引量：8

1刘陆,吕昊,底群.单片机虚拟实验室仿真软件中的仿真编译算法[J].西安职业技术学院学报,2013,6(3):39-42.
2王鼎兴,周光明.模式匹配在函数语言中的作用及其编译算法[J].计算机学报,1989,12(11):811-820.
3HUANG QINGNAN and XU MIN(Southwest computation Center, P. O.BOX 532-101,Chengdu 610003, P.R.CHINA).APPLICATIONS OF THE PACK & UNPACK TECHNIQUE IN VECTOR COMPUTATION[J].Wuhan University Journal of Natural Sciences,1996,1(Z1):321-324.
4姜伟华,梅超,郭一,朱嘉华,臧斌宇,朱传琪.一种针对多媒体扩展指令集和实际多媒体程序的自动向量化方法[J].计算机学报,2005,28(8):1255-1266. 被引量：3
5谌志群,王荣波.基于Flash的编译算法动态演示系统设计[J].计算机时代,2011(9):59-61. 被引量：5
6黎燕霞,李扬,刘奕宏,熊邦宏.RS码纠错技术在PDF417码编译算法中的应用[J].广东工业大学学报,2009,26(2):69-73.
7唐强平,黄锐军,吴瑜,覃国蓉.G代码行PIDP编译算法[J].深圳信息职业技术学院学报,2006,4(2):1-4.
8王贵新,郑孝宗,张浩然,张小川.基于Word2vec的短信向量化算法[J].电子科技,2016,29(4):49-52. 被引量：4
9傅忠传,高洋,李东,张泽旭,崔平远,李馨梅.Metric多核子方法划分编译算法设计与实现[J].哈尔滨工业大学学报,2011,43(7):76-79.
10张素平,王冬,丁丽丽,王鹏翔,宫一,于海宁.一种基于SLP的新型编译框架[J].计算机应用研究,2017,34(1):21-26. 被引量：1

武汉大学学报（理学版）

2016年第5期

浏览历史

内容加载中请稍等...

一种新型自动向量化编译算法

参考文献3

二级参考文献29

共引文献43

相关作者

相关机构

相关主题

浏览历史