期刊文献+

魂芯分簇VLIW DSP上指令调度的优化 被引量:2

Instruction scheduling optimization for clustered VLIW BWDSP
下载PDF
导出
摘要 魂芯DSP处理器是一款32 bit静态超标量、分簇结构的、支持SIMD的VLIW处理器。魂芯DSP芯片有4个执行簇和3个内存块,但簇间数据传输和寻址会占用总线带宽。魂芯DSP上每个簇中有大量的计算部件,但是现有的编译器框架中指令调度算法是针对非分簇结构的,无法充分利用魂芯DSP的分簇结构特点,产生出高效的指令级并行代码。根据魂芯处理器架构分簇的特点,提出了在魂芯DSP上进行指令分簇和指令调度的启发式算法,并且在开源Open64编译器框架上进行了实现。实验结果表明,该算法在魂芯DSP编译器上的实现可以显著提高一些在DSP上有着计算密集型程序的性能。 BWDSP is a 32 bit static scalar digital signal processor which supports clustering and SIMD features. The BWDSP chip has four execution clusters and three memory blocks,but the inter-cluster data transmission and addressing will occupy the bus bandwidth. There are a large number of computing components in each cluster of the core BWDSP,but the instruction scheduling algorithm in the existing compiler framework is for non-clustered structure,and can not make full use of the clustering structure characteristic of the core BWDSP to produce efficient instruction level parallelism( IPL). According to the characteristics of the core processor architecture,a heuristic algorithm for instruction clustering and instruction scheduling on the BWDSP core is proposed to improve the instruction level parallelism. The framework is implemented on traditional Open64 compiler framework. Experimental results show that the implementation of the instructions can meet the requirements of the circumstances and the proposed technique is capable of generating more efficient code.
出处 《微型机与应用》 2017年第11期23-26,30,共5页 Microcomputer & Its Applications
基金 "核高基"重大专项(2012ZX01034-001-001)
关键词 分簇体系DSP 指令级并行 指令分簇 指令调度 Open64编译器 multi-cluster DSP ILP instruction partitioning instruction scheduling Open64 compiler
  • 相关文献

参考文献4

二级参考文献35

  • 1陈火旺,刘春林,谭庆平,等.程序设计语言编译原理[M].3版.北京:国防工业出版社,2001.
  • 2University of Houston. Overview of the open64 Compiler Infrastructure[ EB/OL]. [ 2010-09-12 ] http ://www2. cs. uh. edu/ dragon/Documents/open64-doc, pdf.
  • 3Liao Chunhua,Hernandez O,Chapman B. OpenUH : An optimizing portable OpenMP compiler[ J ]. Concurrencyand Computa- tion : Practice and Experience ,2007,19 ( 18 ) :2317 -2332.
  • 4C hen W. Building a source-to-source UPC-to-C translator [ D ]. Berkeley:University of California,2005.
  • 5Randy Allen, Ken Kennedy. Optimizing Compilers for Modem Architectures A Dependence-Based Approach[ M]. US: Morgan Kaufmann Publishers, 2001.
  • 6Nathan T. Slingerland,Alan Jay Smith. Design and characterization of the Berkeley multimedia workload[J] 2002,Multimedia Systems(4):315~327
  • 7Aart J. C. Bik,Milind Girkar,Paul M. Grey,Xinmin Tian. Automatic Intra-Register Vectorization for the Intel? Architecture[J] 2002,International Journal of Parallel Programming(2):65~98
  • 8Andreas Krall,Sylvain Lelait. Compilation Techniques for Multimedia Processors[J] 2000,International Journal of Parallel Programming(4):347~361
  • 9CETC38.BWDSPl00硬件用户手册[R].合肥:中国电子科技集团第三十八研究所,2011:1-2.
  • 10CETC38.BWDSPl00软件用户手册[R].合HE:中国电子科技集团第三十八研究所,2011:181-191.

共引文献10

同被引文献8

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部