期刊文献+

面向国产CPU SW-1600的向量重组

DOMESTIC PRODUCED CPU SW-1600 ORIENTED VECTOR REGROUP
下载PDF
导出
摘要 由于向量化重组指令比较复杂并且不同指令有不同的延迟,从而难以寻找一种统一高效的向量重组算法。对国产CPUSW-1600提供的移位和插入提取指令进行了分析,提出单独依靠移位或插入提取指令实现向量重组的最优算法,并综合这两类指令实现向量重组的高效算法。最后通过实验证明该算法可以较好地对程序进行向量化,对整型数据的加速比达到7.31,对复杂的双精度浮点型程序的加速比也达到1.83。 Since vectorized regroup instructions are comparatively complex whereas different instructions correspond to different delays, it is hard to find out a uniform and efficient vector regroup algorithm. The paper analyzes shifting and insertion/extraction instructions that are offered by domestic produced CPU SW-1600, and presents an optimal algorithm that only depends on shifting or insertion/extraction instructions to realize vector regroup as well as an efficient algorithm that integrates the two types of instructions to realize vector regroup. At last it is proven by experiments that the algorithms can better vectorize programs. The speedup ratio for integer type values reaches 7.31 while that for complex double precision float type programs reaches 1.83.
出处 《计算机应用与软件》 CSCD 2011年第11期230-233,275,共5页 Computer Applications and Software
关键词 SIMD(Single INSTRUCTION MULTIPLE Data) SW-1600 向量重组 SLP SIMD( Single Instruction Multiple Data) SW-1600 Vector regroup SLP
  • 引文网络
  • 相关文献

参考文献12

  • 1Stewart J. An Investigation of SIMD instruction sets [ M ]. University of Bal- larat School of Information Technology and Mathematical Sciences ,2005.
  • 2ICC. http ://icc. gnu. org.
  • 3Free Software Foudation. GCC [CP/OL]. http ://gcc. gnu. org.
  • 4Open64. http://open64. sourceforge. net.
  • 5Tenllado C, Pi - nuel L, Prieto M, et al. Pack transposition : Enhancing superword level parallelism exploitation[ C ]//ParCo, 2005.
  • 6Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets [ C ]//Proc of the ACM SIGPLAN Conference on Programming Language Design and Implementation ,2000:145 -156.
  • 7Shin J, Chame J, Hall M W. Compiler-controlled caching in superword register files for multimedia extension architectures[ C]//PACT, September, 2002.
  • 8Kudriavtsev A, Kogge P. Generation of permutations for sired processors [C]//Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems, 2005:147 -156.
  • 9Hiroaki T, Takeuchi Y, Ota Y, et al. Pack Instruction Generation for Media Processors Using Multi-valued Decision Diagram [ C ]//CODES + ISSS, October 2006.
  • 10Eichenberger A E, Wu P, O' Brien K. Vectorization for SIMD architectures with alignment constraints [ C ]//Proceeding of PLDI, June 2004.
;
使用帮助 返回顶部