期刊文献+

基于超块的统一分簇与模调度

Hyperblock-Based Unified Cluster Assignment and Modulo Scheduling
下载PDF
导出
摘要 超长指令字处理器为了提高指令集并行(ILP)往往采用多个功能单元,从而需要多端口的寄存器文件提供支持.但是寄存器文件会随着端口的增多变得更复杂,频率难以提升,成为系统的瓶颈.分簇是解决这一问题的有效手段.分簇在不影响处理器ILP的前提下减少了每簇寄存器文件的端口数目,但对编译器提出了挑战,编译器必须将指令和操作数在簇间进行合理分配才能得到较好的指令级并行.针对分簇超长指令字结构提出了一种基于超块的统一分簇与模调度编译方法.使用超块技术可以增大调度范围以获得更好的ILP,并且可以处理含有控制流的循环体,增加了模调度的适用范围.超块中指令的分簇与模调度则是统一进行的,这将比分阶段进行有更好的优化效果,因为统一进行是从全局的角度寻求优化而非寻求各个阶段局部优化.在YHFT-DSP/700编译器中的实验结果表明,与ITSS算法相比,该算法可以达到较好的优化效果. In order to exploit instruction level parallelism (ILP), multiple functional units with multi-ports register file are often used in very long instruction word (VLIW) processor. As the number of functional units rises, the number of register file ports will grow accordingly. At some point, the multiplexing logic on register ports can come to dominate the processor' s cycle time. A reasonable solution is to partition the register file into independent clusters. Although clustered architectures reduce register file ports per cluster without performance degradation, they present new challenges to compiler which must assign every operation and operand to a specific cluster and coordinate data movement between clusters to achieve fine ILP. In this paper, a scheduling algorithm for clustered VLIW architectures--hyperblock-based unified cluster assignment and modulo scheduling (HBUCAMS) is proposed. Compared with basic block, hyperblock can provide more larger schedule region for exploiting ILP. Furthermore, because loop bodies with control flow can be converted into hyperblocks, there are more opportunities to apply modulo scheduling. Instead of performing clustered assignment and modulo scheduling sequentially, HBUCAMS put them into a single phase. This unified approach is more effective than phase-ordered approach, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Experiments in YHFT-DSP/700 compiler show that the proposed algorithm can obtain more optimized result than the ITSS algorithm.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第8期1429-1438,共10页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60473079) 国家教育部高等学校博士点基金项目(20059998026)
关键词 超长指令字 编译器 超块 分簇 模调度 指令级并行 VLIW compiler hyperblock cluster assignment modulo scheduling ILP
  • 相关文献

参考文献27

  • 1TMS320C6000 CPU and Instruction Set Reference Guide(Rev.F)[G].Dallas,TX:texas Instruments Inc,2000.
  • 2J Fridman,Z Greenfield.The tiger SHARC DSP architecture[J].IEEE Micro,2000,20(1):66-76.
  • 3P Faraboschi,G Brown,et al.Lx:A technology platform for customizable VLIW embedded processing[C].In:Proc of the 27th Annual Int'l Symp on Computer Architecture.New York:ACM Press,2000.203-213.
  • 4陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量:29
  • 5B R Rau.Iterative modulo scheduling:An algorithm for software pipelining loops[C].In:Proc of the 27th Annual Int'l Symp on Microarchitecture.New York:ACM Press,1994.63-74.
  • 6P Faraboschi,J A Fisher,et al.Instruction scheduling for instruction level parallel processors[J].Proceedings of the IEEE,2001,89(11):1638-1659.
  • 7D M Lavery.Modulo scheduling for control-intensive generalpurpose programs:[Ph D dissertation][D].Urbana,IL:University of Illinois,1997.
  • 8J R Ellis.Bulldog:A Compiler for VLSI Architectures[M].Cambridge,MA:MIT Press,1986.
  • 9P G Lowney,S M Freudenberger,et al.The multiflow trace scheduling compiler[J].Journal of Supercomputer,1993,7(1-2):51-142.
  • 10G Desoli.Instruction assigment for clustered VLIW DSP compilers:a new approach[R].Hewlett-Packard Laboratories,Tech Rep:HPL-98-13,1998.

二级参考文献58

  • 1胡定磊,陈书明.低功耗编译技术综述[J].电子学报,2005,33(4):676-682. 被引量:11
  • 2Fisher J.Very long instruction word architectures and the ELI-512[C].Proceedings of the Tenth Annual International Symposium on Computer Architecture,Stockholm,Sweden,1983,140-150.
  • 3Faraboschi P,Fisher J,Young C.Instruction scheduling for instruction level parallel processors[C].Proceedings of the IEEE,2001,89(11):1638-1659.
  • 4Kim J et al.Experience with a retargetable compiler for a commercial network processor[C].Proceedings of the 2002 International Conference on Compilers,Architecture,and Synthesis for Embedded Systems,Grenoble,France,2002,178-187.
  • 5S Rajagopalan et al.A retargetable VLIW compiler framework for DSPs with instruction level parallelism[J].IEEE Trans.on Computer-Aided Design,2001,20(11):1319-1328.
  • 6Shannon C J.The IMPACT SC140 code generator[D].MS Thesis,Department of Electrical and Computer Engineering,University of Illinois,Urbana IL,2002.
  • 7Chakrapani L N et al.Triceps:enhancing the trimaran compiler infrastructure for strongARM code generation[R].CREST Technical Report:CREST-TR-01-01.
  • 8Leupers R.Instruction scheduling for clustered VLIW DSPs[C].IEEE PACT 2000,291-300.
  • 9Lapinskii V S et al.Cluster assignment for high-performance embedded VLIW processors[J].ACM Transactions on Design Automation of Electronic Systems,2002,7(3):430-454.
  • 10Jang S et al.A code generation framework for VLIW architectures with partitioned register banks[C].In:Proceedings of the Third International Conference on Massively Parallel Computing Systems (MPCS),1998,61-69.

共引文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部