基于超块的统一分簇与模调度

Hyperblock-Based Unified Cluster Assignment and Modulo Scheduling

下载PDF

导出

摘要超长指令字处理器为了提高指令集并行(ILP)往往采用多个功能单元,从而需要多端口的寄存器文件提供支持.但是寄存器文件会随着端口的增多变得更复杂,频率难以提升,成为系统的瓶颈.分簇是解决这一问题的有效手段.分簇在不影响处理器ILP的前提下减少了每簇寄存器文件的端口数目,但对编译器提出了挑战,编译器必须将指令和操作数在簇间进行合理分配才能得到较好的指令级并行.针对分簇超长指令字结构提出了一种基于超块的统一分簇与模调度编译方法.使用超块技术可以增大调度范围以获得更好的ILP,并且可以处理含有控制流的循环体,增加了模调度的适用范围.超块中指令的分簇与模调度则是统一进行的,这将比分阶段进行有更好的优化效果,因为统一进行是从全局的角度寻求优化而非寻求各个阶段局部优化.在YHFT-DSP/700编译器中的实验结果表明,与ITSS算法相比,该算法可以达到较好的优化效果. In order to exploit instruction level parallelism （ILP）, multiple functional units with multi-ports register file are often used in very long instruction word （VLIW） processor. As the number of functional units rises, the number of register file ports will grow accordingly. At some point, the multiplexing logic on register ports can come to dominate the processor＇ s cycle time. A reasonable solution is to partition the register file into independent clusters. Although clustered architectures reduce register file ports per cluster without performance degradation, they present new challenges to compiler which must assign every operation and operand to a specific cluster and coordinate data movement between clusters to achieve fine ILP. In this paper, a scheduling algorithm for clustered VLIW architectures--hyperblock-based unified cluster assignment and modulo scheduling （HBUCAMS） is proposed. Compared with basic block, hyperblock can provide more larger schedule region for exploiting ILP. Furthermore, because loop bodies with control flow can be converted into hyperblocks, there are more opportunities to apply modulo scheduling. Instead of performing clustered assignment and modulo scheduling sequentially, HBUCAMS put them into a single phase. This unified approach is more effective than phase-ordered approach, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Experiments in YHFT-DSP/700 compiler show that the proposed algorithm can obtain more optimized result than the ITSS algorithm.

作者胡定磊陈书明刘春林

机构地区国防科学技术大学计算机学院

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第8期1429-1438,共10页 Journal of Computer Research and Development

基金国家自然科学基金项目(60473079) 国家教育部高等学校博士点基金项目(20059998026)

关键词超长指令字编译器超块分簇模调度指令级并行 VLIW compiler hyperblock cluster assignment modulo scheduling ILP

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献27

1TMS320C6000 CPU and Instruction Set Reference Guide(Rev.F)[G].Dallas,TX:texas Instruments Inc,2000.
2J Fridman,Z Greenfield.The tiger SHARC DSP architecture[J].IEEE Micro,2000,20(1):66-76.
3P Faraboschi,G Brown,et al.Lx:A technology platform for customizable VLIW embedded processing[C].In:Proc of the 27th Annual Int'l Symp on Computer Architecture.New York:ACM Press,2000.203-213.
4陈书明,李振涛,万江华,胡定磊,郭阳,汪东,扈啸,孙书为.“银河飞腾”高性能数字信号处理器研究进展[J].计算机研究与发展,2006,43(6):993-1000. 被引量：29
5B R Rau.Iterative modulo scheduling:An algorithm for software pipelining loops[C].In:Proc of the 27th Annual Int'l Symp on Microarchitecture.New York:ACM Press,1994.63-74.
6P Faraboschi,J A Fisher,et al.Instruction scheduling for instruction level parallel processors[J].Proceedings of the IEEE,2001,89(11):1638-1659.
7D M Lavery.Modulo scheduling for control-intensive generalpurpose programs:[Ph D dissertation][D].Urbana,IL:University of Illinois,1997.
8J R Ellis.Bulldog:A Compiler for VLSI Architectures[M].Cambridge,MA:MIT Press,1986.
9P G Lowney,S M Freudenberger,et al.The multiflow trace scheduling compiler[J].Journal of Supercomputer,1993,7(1-2):51-142.
10G Desoli.Instruction assigment for clustered VLIW DSP compilers:a new approach[R].Hewlett-Packard Laboratories,Tech Rep:HPL-98-13,1998.

二级参考文献58

1胡定磊,陈书明.低功耗编译技术综述[J].电子学报,2005,33(4):676-682. 被引量：11
2Fisher J.Very long instruction word architectures and the ELI-512[C].Proceedings of the Tenth Annual International Symposium on Computer Architecture,Stockholm,Sweden,1983,140-150.
3Faraboschi P,Fisher J,Young C.Instruction scheduling for instruction level parallel processors[C].Proceedings of the IEEE,2001,89(11):1638-1659.
4Kim J et al.Experience with a retargetable compiler for a commercial network processor[C].Proceedings of the 2002 International Conference on Compilers,Architecture,and Synthesis for Embedded Systems,Grenoble,France,2002,178-187.
5S Rajagopalan et al.A retargetable VLIW compiler framework for DSPs with instruction level parallelism[J].IEEE Trans.on Computer-Aided Design,2001,20(11):1319-1328.
6Shannon C J.The IMPACT SC140 code generator[D].MS Thesis,Department of Electrical and Computer Engineering,University of Illinois,Urbana IL,2002.
7Chakrapani L N et al.Triceps:enhancing the trimaran compiler infrastructure for strongARM code generation[R].CREST Technical Report:CREST-TR-01-01.
8Leupers R.Instruction scheduling for clustered VLIW DSPs[C].IEEE PACT 2000,291-300.
9Lapinskii V S et al.Cluster assignment for high-performance embedded VLIW processors[J].ACM Transactions on Design Automation of Electronic Systems,2002,7(3):430-454.
10Jang S et al.A code generation framework for VLIW architectures with partitioned register banks[C].In:Proceedings of the Third International Conference on Massively Parallel Computing Systems (MPCS),1998,61-69.

共引文献34

1扈啸,陈书明,李杰,陈莉丽.片上trace:嵌入式处理器的有效调试和优化技术[J].国防科技大学学报,2008,30(2):46-50. 被引量：2
2李振涛,陈书明.全定制电路功能模型提取的若干新算法[J].计算机辅助设计与图形学学报,2007,19(5):628-634. 被引量：2
3胡定磊,陈书明.降低指令存储器功耗的一种有效方法:循环缓冲[J].计算机工程与科学,2007,29(6):93-96. 被引量：2
4胡定磊,陈书明,刘春林.奇异数据类型的编译支持[J].计算机工程,2007,33(3):29-31. 被引量：1
5扈啸,陈书明.面向多核片上Trace数据流合成的队列调度算法设计及实现[J].计算机研究与发展,2008,45(3):417-427. 被引量：2
6马鹏勇,李振涛,陈书明.带定向通路的十读六写寄存器文件全定制设计[J].计算机工程与科学,2008,30(7):94-97.
7汪东,陈书明.DSCF:一种面向共享存储多核DSP的数据流分簇前向技术[J].计算机研究与发展,2008,45(8):1446-1453. 被引量：1
8扈啸,李杰,陈莉丽,陈书明.多核处理器YHFT-QDSP的调试系统[J].计算机工程与科学,2008,30(9):116-118. 被引量：4
9马鹏勇,陈书明,孙锁林.双簇结构DSP的数据Cache优化[J].计算机工程与科学,2008,30(9):119-121.
10陈书明,汪东,陈小文,万江华.一种面向多核DSP的小容量紧耦合快速共享数据池[J].计算机学报,2008,31(10):1737-1744. 被引量：13

1吴佩华,郭勇,漆锋滨.模调度与DFA结合的技术及其在gcc上的实现[J].计算机工程与应用,2004,40(31):102-105.
2宋健,葛颖增,窦勇.资源约束的FPGA流水线调度[J].计算机工程,2008,34(15):44-46. 被引量：1
3王向前,郑启龙,洪一.分簇结构模调度框架研究[J].中国科学技术大学学报,2016,46(2):104-112. 被引量：3
4陈纪孝,李勇.软件流水循环缓冲的设计与实现[J].计算机科学,2013,40(4):35-37. 被引量：4
5王雷.一种新颖的字模调度算法—SC算法[J].中国计算机用户,1990(8):7-8.
6刘家兵,徐云.X86平台上Open64软件流水的设计与实现[J].计算机工程,2013,39(9):15-19. 被引量：2
7方志红,常越.TMS320C6X的SPLOOP技术[J].雷达科学与技术,2014,12(4):437-440.
8谭明星,刘先华,张吉豫,程旭.基于优化回溯模型的无重叠模调度算法[J].电子学报,2012,40(8):1681-1686.
9崔雪冰,张俊峰,崔平非.IA-64二进制翻译的软件流水消除技术[J].计算机工程,2010,36(11):88-89.
10周谦,冯晓兵,张兆庆.cache profiling信息指导的软件流水[J].计算机研究与发展,2008,45(5):834-840. 被引量：1

计算机研究与发展

2007年第8期

浏览历史

内容加载中请稍等...

基于超块的统一分簇与模调度

参考文献27

二级参考文献58

共引文献34

相关作者

相关机构

相关主题

浏览历史