期刊文献+

面向多簇架构DSP的树匹配向量化算法

SIMD Algorithm Based on Tree Matching for Multi-cluster and VLIW DSP
下载PDF
导出
摘要 BWDSP是针对高性能计算设计的一款新型的处理器,采用多簇超长指令字体系结构和SIMD架构,有丰富的指令集.为充分利用BWDSP提供的向量化资源,迫切需要提出一种向量化算法.本文在open64基础上研究并实现了面向多簇超长指令字(VLIW)DSP的SIMD编译优化算法.算法基于OPEN64的中间语言WHIRL,能够充分地利用BWDSP丰富的硬件资源和向量化指令.最终实验结果表明,对于能够合成双字和单字的循环程序,该优化算法能够平均取得6倍和4倍的加速比. BWDSP is a new type of processor designed for high performance computing, using multi-cluster VLIW structure and SIMD architecture, including a rich instruction set. In order to make full use of the resources of BWDSP, a SIMD algorithm is to be proposed. In this paper, an algorithm for DSP SIMD compiler optimization based on open64 infrastructure is studied and implemented. This algorithm is based on WHIRL intermediate language of Open64 and can make full use of rich hardware resources and vector instruction set. The experimental result shows that the vectorization algorithm achieves 6 times performance improvement for double-word vectorization and 4 times performance for single-word vectorization on average.
出处 《计算机系统应用》 2015年第10期142-147,共6页 Computer Systems & Applications
基金 "核高基"重大专项(2012ZX01034-00-001)
关键词 单指令多数据 WHIRL树 多簇 超长指令字 指令并行 SIMD WHIRL tree multi-cluster VLIW instruction-level parallelism
  • 相关文献

参考文献15

  • 1FisherJA,FaraboschiP,YoungC.嵌入式计算:体系结构,编译器和工具的VLlw方法(英文影印版).北京:机械工业出版社,2006:337—395.
  • 2CETC 38, BWDSP 100 Software User Manual.
  • 3High Performance Computing Tools Group. Overview of the Open64 Compiler Infrastructure. University of Houston Computer Science Department. November 12, 2002.
  • 4SIMD. en.wikipedia.org/wiki/SIMD.
  • 5张为华.臧斌宇.SIMD编译优化技术研究概述.中国计算机学会通讯,2007,3(2):27-36.
  • 6Talla D, John LK, Burger D. Bottlenecks in multimediaprocessing with SIMD style extensions and architectural enhancements. IEEE Trans. Comput, 2003, 52(8): 1015-1031.
  • 7Kim S, Han H. Efficient simd code generation for irregular kernels. ACM SIGPLAN Notices, 2012, 47(8): 55-64.
  • 8Manniesing R, Karkowski I, Corporaal H. Automatic SIMD parallelization of embedded applications based on pattern recognition. Euro-Par 2000 Parallel Processing. Springer Berlin Heidelberg. 2000. 349-356.
  • 9Pokam G~ Simonnet J, Bodin F. A retargetable preprocessor for multimedia instructions. Proc. of the 9th Workshop on Compilers for Parallel Computers. 2001. 291-301.
  • 10Wang L, Zhang C, Huang YZ. An optimization approach for SIMD alignment in mathematical functions. Advances in Computer, Communication, Control and Automation. Springer Berlin Heidelberg. 2012.37-43.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部