
分簇VLIW DSP的SIMD编译优化 被引量:3

SIMD optimization for clustered VLIW DSP
摘要 针对数字信号处理的应用特点,提出了一种识别SIMD指令的一般性方法;针对分簇结构SIMD指令的特点,给出了新的指令分簇算法和寄存器分配算法;最后在BWDSP100芯片的编译器上实现这些优化方法.实验结果表明,上述优化方法能充分发挥分簇结构SIMD指令的优势,提高编译器的效率. A general method for identifying SIMD instructions was presented for the features of digital signal processing applications.And the new cluster assignment algorithm and register allocation algorithm were given for the features of SIMD instructions on cluster architectures.Finally,the compiler optimization methods mentioned above was implemented on BWDSP100,with satisfactory.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2011年第8期708-714,共7页 JUSTC
基金 国家"核高基"重大专项(2009ZX01028-002-003-005) 中国高技术研究发展(863)计划(2008AA010902)资助
关键词 超长指令字 分簇结构 单指令流多数据流 编译优化 VLIW cluster architecture SIMD compile optimization
  • 相关文献


  • 1SIMD[EB/OL]. en. wikipedia, org/wiki/SIMD.
  • 2Cheng G, Lain M. An optimizer for multimedia instruction sets[R]. Proceedings of the 2nd SUIF Compiler Workshop, Stan{ord University, 1997.
  • 3Krall A, Lelait S. Compilation techniques for multimedia processors [ J]. International Journal of Parallel Programming, 2000, 28(4) .. 347 361.
  • 4Wu P, gichenberger A E, Wang A. Efficient SIMI) code generation for runtime align ment and length conversion [C]// Proceedings of the International Symposium on Code Generation and Optimization. Los Alamitos, USA: IEEE Press, 2005:153 164.
  • 5Fraser C W, Hanson D R, Proebsting T A. Engineering a simple, efficient code-generator generator [J]. ACM Letters on Programming Languages and Systems, 1992, 1(3): 213-226.
  • 6赵常智,刘春林,胡定磊,陈书明.一种支持SIMD指令的表驱动的代码选择技术[J].计算机应用研究,2006,23(6):45-48. 被引量:2
  • 7Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets [ C]// Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York, USA: ACM Press, 2000: 145-156.
  • 8Hohenauer M, Engel F, Leupers R, et al. A SIMD optimization framework for retargetable compilcrs[J].ACM Transactions on Architecture and Code Optimization, 2009, 6(1): 1-27.
  • 9Hwu W W. The IMPACT Research Group[EB/OL]. http://impact, crhc. illinois, edu/.
  • 10雷一鸣,洪一,徐云,姜海涛.一种基于寄存器压力的VLIWDSP分簇算法[J].计算机应用,2010,30(1):274-276. 被引量:9


  • 1DESOLI G. Instruction assignment for clustered VLIW DSP compilers: A new approach[ EB/OL]. [ 2009 - 06 - 20]. http://www. hpl. hp. com/techreports/98/HPL-98-13, pdf.
  • 2LAPINSKII V, JACOME M F, VECIANA G A. Cluster assignment for high performance embedded VLIW processors[ J]. ACM Transactions on Design Automation of Electronic Systems, 2002, 7(3) : 430 - 454.
  • 3HWU W W. The IMPACT Research Group[ EB/OL]. [ 2009 - 03 - 15]. http://impact, crhc. illinois, edu/.
  • 4RAU B R. Iterative modulo scheduling: An algorithm for software pipelining loops[ C]//Proceedings of the 27th International Symposium on Microarchitecture. New York: ACM, 1994:63 - 74.
  • 5CHOW F. Register allocation by priority-based coloring[ J]. ACM SIGPLAN Notices, 1984, 19(6) : 222 -232.
  • 6PHILIP B. Gibbons Efficient instruction scheduling for a pipelined architecture[ J]. ACM SIGPLAN Notices, 1986, 21 (7) : 11 - 16.
  • 7The Institute for Integrated Signal Processing Systems . DSPstone [ EB/OL]. [ 2009 -03 -20]. http://www, ert. rwth-aaehen, de/ Projekte/Tools/DSPSTONE/dspstone htmt.
  • 8Alfred V Aho,Mahadevan Ganapathi,et al.Code Generation Using Tree Matching and Dynamic Programming[J].ACM Transactions on Programming Languages and System,1989,11(4):491-516.
  • 9Rainer Leupers.Code Selection for Media Processors with SIMD Instructions[C].Design,Automation,and Test in Europe,2000.4-8.
  • 10Rainer Leupers,Steven Bashford.Graph-based Code Selection Techniques for Embedded Processors[J].ACM Transactions on Design Automation of Electronic Systems,2000,5(4):794-814.



  • 1李文龙,刘利,汤志忠.软件流水中的循环展开优化[J].北京航空航天大学学报,2004,30(11):1111-1115. 被引量:16
  • 2Allen R, Kennedy K. Optimizing Compilers for Modern Architectures [ M ]. San Francisco: Morgan Kaufmann, 2002 : 9 - 12.
  • 3Xu D P, Zheng Q L. An Address-Based Compiling Optimization for FFT on Multi-cluster DSP[ C ]//Proceedings of the International Symposium on Parallel Architectures ,2012:60 - 64.
  • 4Lapinskii V S, Jacome M F, De Veciana G. Cluster assignment for high- performance embedded VLIW processor [ J ]. ACM Trans. on Design Automation of Electtonic Systems ,2002,7 ( 3 ) :430 - 454.
  • 5Aho A V, Lam M S, Sethi R, et al. Compilers : Principles, Techniques and Tools[ M ]. Addison-Wesley ,2007:358 - 359.
  • 6Hank R E. Machine independent register allocation for the IMPACT-I C compiler [ D ]. Urbana IL, Department of Electrical and Computer Engi- neering, Univerity of Illinois,1993.
  • 7邱鹏飞,洪一,耿锐,徐云.基于数据流图的异构VLIW DSP分簇方法[J].计算机应用,2011,31(4):935-937. 被引量:1
  • 8郑启龙,卢世贤,洪兴勇,陈元,夏霏.DSP分块内存和多AGU的编译指示优化[J].小型微型计算机系统,2012,33(3):582-586. 被引量:3
  • 9付和萍,郑启龙,陈思灵,冯玉谦.基于编译制导的复数乘法优化设计[J].计算机工程,2012,38(24):225-227. 被引量:1
  • 10陈书明,刘胜,万江华,王耀华,陈胜刚,陈海燕,刘衡竹,孙海燕,刘仲.协同多核DSP YHFT-QMBase:体系结构及实现[J].中国科学:信息科学,2015,45(4):560-573. 被引量:7










使用帮助 返回顶部