新型超字级并行改进算法

New improved algorithm for superword level parallelism

下载PDF

导出

摘要对于超字级并行(SLP)算法不能有效地处理大型程序中并行代码率较小,且可向量化的代码中可能存在对向量化不利的代码的问题,提出了一种新型的SLP改进算法NSLPO。首先,将程序中不能向量化的非同构语句进行同构化处理,定位SLP丢失的向量化机会;然后,通过冗余节点添加构建最大通用子图,通过冗余删除等优化过程得到同构化之后的补充SLP图,提高程序中代码的并行性;最后,运用节流法将对向量化有害的代码摒除在向量化之外,仅对它们进行标量处理,通过只向量化处理那些向量化有收益的代码以尽可能地提升程序效率。在一组广泛使用的内核测试集中进行实验,结果显示,与SLP算法相比,NSLPO算法性能更优,其执行时间比SLP平均减少9.1%。 For SLP （Superword Level Parallelism） algorithm cannot effectively process the large-scale applications covered with few parallel codes, and the codes which can be vectorized may be adverse to veetorization. A new improved algorithm for SLP was proposed, namely NSLPO. First of all, the non-isomorphic statements which cannot be vectorized were transformed to isomorphic statements as far as possible, thus locating the opportunities of vectorization which SLP has lost. Secondly, the Max Common Subgraph （MCS） was built by adding redundant nodes, and the supplement diagram of SLP was got by using some optimization such as redundancy deleting, which can greatly increase the parallelism of program. At last, the codes which are harmful to veetorization were exclued out of veetorization by using cutting method and executed in serial, only the valuable codes for vectorization were vectorized to improve the efficiency of programs as far as possible. Experiments were conducted on widely used kernel test sets. The experimental results show that compared with the SLP algorithm, the proposed NSLPO algorithm has better performance and its running time was reduced by 9.1%.

作者张素平韩林丁丽丽王鹏翔 ZHANG Suping HAN Lin DING Lili WANG Pengxiang(State Key Laboratory of Mathematical Engineering and Advanced Computing （ Information Engineering University）, Zhengzhou Henan 450001, China)

机构地区数学工程与先进计算国家重点实验室(信息工程大学)

出处《计算机应用》 CSCD 北大核心 2017年第2期450-456,462,共8页 journal of Computer Applications

基金 "核高基"国家科技重大专项(2009ZX01036-001-001-2)~~

关键词同构节流法向量化超字级并行补充图 isomorphism cutting method vectorization Superword Level Parallelism （SLP） supplement diagram

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1高伟,赵荣彩,韩林,庞建民,丁锐.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284. 被引量：30
2侯永生,赵荣彩,高伟,高伟.非正规化循环的单指令多数据向量化[J].计算机应用,2013,33(11):3149-3154. 被引量：1
3徐金龙,赵荣彩,韩林.分段约束的超字并行向量发掘路径优化算法[J].计算机应用,2015,35(4):950-955. 被引量：11
4魏帅,赵荣彩,姚远.面向SLP的多重循环向量化[J].软件学报,2012,23(7):1717-1728. 被引量：13
5索维毅,赵荣彩,姚远,刘鹏.面向DSP的超字并行指令分析和冗余优化算法[J].计算机应用,2012,32(12):3303-3307. 被引量：1

二级参考文献57

1AllenR,KennedyK现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等,译.北京:机械工业出版社,2004.
2Stewart J. An investigation of SIMD instruction sets. University of Ballarat School of Information Technology and Mathematical Sciences, 2005. http://noisymime.org/blogimages/SIMD.pdf.
3Nuzman D, Rosen I, Zaks A. Auto-Vectorization of interleaved data for SIMD, In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation. Ottawa: ACM Press, 2006. 132-143. [doi: 10.1145/1133981.1133996].
4Zheng WM, Tang ZZ. Compiler Archtecture. Beijing: Tsinghua University Press, 1998 (in Chinese).
5Allen R, Kennedy K. Optimizing Compilers for Modern Architectures--A Dependence-Based Approach. San Francisco: Morgan Kaufmann Publishers, 2001.
6Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press, 2000 (in Chinese).
7Bik AJC. The Software Vectorization Handbook--Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
8Hampton M, Asanovic K. Compiling for vector-thread architectures. In: Proc. of the 6th Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. Boston: ACM Press, 2008.205-215. [doi: 10.1145/1356058.1356085].
9Naishlos D, Biberstein M, Ben-David S, Zaks A. Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Int'l ConL on Compilers, Architecture and Synthesis for Embedded Systems. San Jose: ACM Press, 2003.2-11. [doi: 10.1145/951710.951714].
10Bik AJC, GirKar M, Grey PM, Tian XM. Automatic intra-register vectorization for the Intel architecture. Int'l Journal of Parallel Programming, 2002,30(2):65-98. [doi: 10.1023/A:1014230429447].

共引文献43

1侯永生,赵荣彩,高伟,高伟.非正规化循环的单指令多数据向量化[J].计算机应用,2013,33(11):3149-3154. 被引量：1
2赵博,赵荣彩,李雁冰,高伟.类型转换语句的SLP发掘方法[J].计算机科学,2014,41(11):16-21. 被引量：2
3赵博,赵荣彩,徐金龙,高伟.渐进式智能回溯向量化代码调优方法[J].计算机科学,2015,42(1):50-53.
4王向前,洪一,郑启龙.分块内存的数据分布优化[J].小型微型计算机系统,2015,36(4):815-819. 被引量：1
5沈凤仙,孙勤红.嵌入式路由冲突下链路分离语义检索优化模型[J].微电子学与计算机,2015,32(7):142-146. 被引量：1
6王向前,洪一,王昊,郑启龙.魂芯DSP的编译器设计与优化[J].电子学报,2015,43(8):1656-1661. 被引量：8
7杨秀荣.并行数据库中异常数据优化分类挖掘方法研究[J].微电子学与计算机,2015,32(10):125-128. 被引量：4
8李晓东,魏惠茹.支持多模推荐的多层数据库优化访问技术[J].科技通报,2015,31(12):110-112. 被引量：2
9林荫,朱莹莹.基于小波包分解正态谐振数据库优化访问控制[J].科技通报,2015,31(12):113-114. 被引量：2
10唐佳,王凡,刘福烈.三维波动方程正演的三级并行加速[J].石油地球物理勘探,2016,51(5):1049-1054. 被引量：8

1胡进德,付晓军.μC/OS-Ⅱ在TMS320LF2407A上的内核测试[J].重庆职业技术学院学报,2007,16(4):152-155.
2李绍勋,陈朔鹰,罗国良.Linux 2.6内核测试及其到ARM嵌入式平台的移植[J].电子质量,2005(5):5-7. 被引量：5
3张丹青,陈云秋,王继红.基于LTP的Linux内核测试方法研究[J].计算机与数字工程,2008,36(8):90-94. 被引量：1
4美女大变身的八大妙计[J].网友世界,2009(18):15-15.
5镜花水月.个性第一随心所欲修改APP的图标[J].电脑爱好者,2015,0(16):60-60.
6王丽一,文延华.动态二进制翻译中的冗余LOAD删除优化技术[J].计算机应用与软件,2008,25(6):40-43. 被引量：2
7刘刚,向东.三维片上网络内核测试[J].中国电子商情（通信市场）,2013(1):133-138.
8曹明法.应用INFI-90控制系统提高蒸汽流量测量精度[J].上海电力,2002,15(1):15-17.
9李玉霞,刘丽.基于标称变量向量化处理的网络入侵检测算法[J].科技通报,2014,30(2):99-101. 被引量：5
10徐金龙,赵荣彩,韩林.分段约束的超字并行向量发掘路径优化算法[J].计算机应用,2015,35(4):950-955. 被引量：11

计算机应用

2017年第2期

浏览历史

内容加载中请稍等...

新型超字级并行改进算法

参考文献5

二级参考文献57

共引文献43

相关作者

相关机构

相关主题

浏览历史