期刊文献+

不充分SIMD向量化技术研究 被引量:4

RESEARCH ON VECTORIZATION TECHNOLOGY FOR INSUFFICIENT SIMD
下载PDF
导出
摘要 一些程序由于其迭代次数不足,或者基本块内向量并行的语句不够多,不足以为向量寄存器提供足够的并行,为进一步提升程序执行效率,对不充分向量化技术进行研究。研究向量寄存器的使用方式,并基于向量寄存器的不充分使用提出两种不充分向量化的实现方法,分析不同实现方法的性能收益。实验结果表明,该方法可以有效地对并行性不足的循环或者基本块进行向量化处理,提高程序执行效率。 Some programs are not able to provide sufficient parallelism to vector registers because the number of iterations or the vector parallel statements of the basic block is not enough. To improve the efficiency of program execution, the insufficient vectorization technology is studied. This paper researches on the vector register usage mode, and presents two insufficient vectorization methods based on the insufficient use of vector registers and analyses the performance benefits of different implementations. Finally, the experimental results show that this method can vectorize the loop or basic block with insufficient parallelism and improve the program execution efficiency.
作者 王琦 韩林 姚金阳 陶小涵 Wang Qi;Han Lin;Yao Jinyang;Tao Xiaohan(State Key Laboratory of Mathematical Engbwering and Advanced Computing,Zhengzhou 450001,Henan,China)
出处 《计算机应用与软件》 北大核心 2018年第9期108-112,共5页 Computer Applications and Software
基金 国家重点研发计划"高性能计算"重点专项基金项目(2016YFB0200503)
关键词 SIMD并行 向量寄存器 不充分向量化 SIMD parallelism Vector register Insufficient vectorization
  • 相关文献

参考文献2

二级参考文献16

  • 1Peleg A, Weiser U. MMX Technology Extension to the Intel Architecture[J]. IEEE/ACM International Symposium on Mi- croarchitecture, 1996,16 (4) : 42-50.
  • 2Intel Corporatior Intel 64 and 1A-32 Architectures Software Developer's Manual[EB/OL]. http://www, intel, corn/Assets/ PDF/manual/252046. pdf, 2011.
  • 3Reinders J. AVX-512 instructions[EB/OL], https://software. intel, com/en-us/blogs/2013/avx-512-instructions, 2013.
  • 4Reinders J. Additional AVX-512 instructions[EB/OL], https:// software, intel, com/en-us/blogs/additional-avx-512-instructions, 2014.
  • 5Intel Corporation. IA32 Intel Architecture Software Developer's Manual, Volume I : Basic Architecture[M].Intel Press, 2004.
  • 6SIMD [EB/OL]. http: Hen. wikipedia, org/wiki/SIMD. 2014.
  • 7AlienR,Kennedyk.现代体系结构的优化编译器[M].张兆庆,乔如良,冯晓兵,等译.北京:机械工业出版社,2004.
  • 8Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets[C] //Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2000 : 145-156.
  • 9Prieto M, Pifiuel L,Catthoor F, et al. Improving superword level parallelism support in modern compilers[C]//Third IEEE/ ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis ( C_k)DES + ISSS ' 05 ). IEEE, 2005 : 303-308.
  • 10Barik R,Zhao J, Sarkar V. Efficient selection of vector instruc- tions using dynamic programming [C]//2010 43rd Annual IEEE/ACM International Symposium on Mieroarchitecture (MICRO). IEEE, 2010: 201-212.

共引文献29

同被引文献19

引证文献4

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部