期刊文献+

新型超字级并行改进算法

New improved algorithm for superword level parallelism
下载PDF
导出
摘要 对于超字级并行(SLP)算法不能有效地处理大型程序中并行代码率较小,且可向量化的代码中可能存在对向量化不利的代码的问题,提出了一种新型的SLP改进算法NSLPO。首先,将程序中不能向量化的非同构语句进行同构化处理,定位SLP丢失的向量化机会;然后,通过冗余节点添加构建最大通用子图,通过冗余删除等优化过程得到同构化之后的补充SLP图,提高程序中代码的并行性;最后,运用节流法将对向量化有害的代码摒除在向量化之外,仅对它们进行标量处理,通过只向量化处理那些向量化有收益的代码以尽可能地提升程序效率。在一组广泛使用的内核测试集中进行实验,结果显示,与SLP算法相比,NSLPO算法性能更优,其执行时间比SLP平均减少9.1%。 For SLP (Superword Level Parallelism) algorithm cannot effectively process the large-scale applications covered with few parallel codes, and the codes which can be vectorized may be adverse to veetorization. A new improved algorithm for SLP was proposed, namely NSLPO. First of all, the non-isomorphic statements which cannot be vectorized were transformed to isomorphic statements as far as possible, thus locating the opportunities of vectorization which SLP has lost. Secondly, the Max Common Subgraph (MCS) was built by adding redundant nodes, and the supplement diagram of SLP was got by using some optimization such as redundancy deleting, which can greatly increase the parallelism of program. At last, the codes which are harmful to veetorization were exclued out of veetorization by using cutting method and executed in serial, only the valuable codes for vectorization were vectorized to improve the efficiency of programs as far as possible. Experiments were conducted on widely used kernel test sets. The experimental results show that compared with the SLP algorithm, the proposed NSLPO algorithm has better performance and its running time was reduced by 9.1%.
作者 张素平 韩林 丁丽丽 王鹏翔 ZHANG Suping HAN Lin DING Lili WANG Pengxiang(State Key Laboratory of Mathematical Engineering and Advanced Computing ( Information Engineering University), Zhengzhou Henan 450001, China)
出处 《计算机应用》 CSCD 北大核心 2017年第2期450-456,462,共8页 journal of Computer Applications
基金 "核高基"国家科技重大专项(2009ZX01036-001-001-2)~~
关键词 同构 节流法 向量化 超字级并行 补充图 isomorphism cutting method vectorization Superword Level Parallelism (SLP) supplement diagram
  • 相关文献

参考文献5

二级参考文献57

  • 1AllenR,KennedyK现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等,译.北京:机械工业出版社,2004.
  • 2Stewart J. An investigation of SIMD instruction sets. University of Ballarat School of Information Technology and Mathematical Sciences, 2005. http://noisymime.org/blogimages/SIMD.pdf.
  • 3Nuzman D, Rosen I, Zaks A. Auto-Vectorization of interleaved data for SIMD, In: Proc. of the ACM SIGPLAN Conf. on Programming Language Design and Implementation. Ottawa: ACM Press, 2006. 132-143. [doi: 10.1145/1133981.1133996].
  • 4Zheng WM, Tang ZZ. Compiler Archtecture. Beijing: Tsinghua University Press, 1998 (in Chinese).
  • 5Allen R, Kennedy K. Optimizing Compilers for Modern Architectures--A Dependence-Based Approach. San Francisco: Morgan Kaufmann Publishers, 2001.
  • 6Shen ZY, Hu ZA, Liao XK, Wu HP, Zhao KJ, Lu YT. Methods of Parallel Compilation. Beijing: National Defence Industry Press, 2000 (in Chinese).
  • 7Bik AJC. The Software Vectorization Handbook--Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
  • 8Hampton M, Asanovic K. Compiling for vector-thread architectures. In: Proc. of the 6th Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization. Boston: ACM Press, 2008.205-215. [doi: 10.1145/1356058.1356085].
  • 9Naishlos D, Biberstein M, Ben-David S, Zaks A. Vectorizing for a SIMdD DSP architecture. In: Proc. of the 2003 Int'l ConL on Compilers, Architecture and Synthesis for Embedded Systems. San Jose: ACM Press, 2003.2-11. [doi: 10.1145/951710.951714].
  • 10Bik AJC, GirKar M, Grey PM, Tian XM. Automatic intra-register vectorization for the Intel architecture. Int'l Journal of Parallel Programming, 2002,30(2):65-98. [doi: 10.1023/A:1014230429447].

共引文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部