ORC的全局指令调度技术

Global Instruction Scheduling Technique in ORC

下载PDF

导出

摘要 IA 6 4是一种崭新的体系结构 ,它为挖掘程序中潜在的指令级并行提供了丰富的硬件支持 ,例如 :大寄存器组、(控制 /数据 )投机、谓词等 .Itanium是IA 6 4的一个具体实现 .该文作者将Bernstein的基于超标量处理机的全局指令调度算法应用于显式并行 (EPIC)的Itanium处理机上 .在结合Itanium处理机特性的同时 ,作者对Bernstein的算法有以下两点创新 :(1)应用层次化区域 .相对于传统的扁平区域 ,这样的区域具有很强的灵活性并提供了调度器大小合适的调度范围 ,使其既能充分利用硬件资源又能够有效地控制调度的时间和空间开销 .(2 )集成P Ready指令调度 .P Ready是在与Bernstein算法框架差异很大的上下文中提出的 .P Ready指令调度能够把优先级高的指令尽早调度即使这条指令并没有在所有经过它的执行路径上解除数据依赖 .集成P Ready指令调度到Bernstein的算法框架上是十分有意义的 .作者在“基于Itanium处理机的开放源码编译器ORC”中实现了该文介绍的算法 ,实验结果显示全局指令调度器对CPU2 0 0 0int基准测试例平均有 8.4 %的运行时加速比 .作为应用层次化区域的优越性的一个反映 ,调度指令跨越嵌套循环最高可取得 12 .9%的运行时加速比 .此外 ,P Ready指令调度对CPU2 0 0 0int的测试例平均有 1.37%的运行? IA-64 is a novel architecture that provides ample hardware support to exploit instruction level parallelism. And it is exemplified by the Itanium processor. Authors adapt D. Bernstein's algorithm, which is targeted for superscalar processor, to EPIC processor Itanium. And authors take full advantage of Itanium specific details, and have two improvements to D. Bernstein's algorithm: (1) Apply hierarchical region structure into this algorithm. (2) Integrate P-Ready instruction scheduling. P-Ready is proposed under a context that is totally different with D. Bernstein's. The algorithm proposed is implemented in the open research compiler ORC. Authors' results show that the global scheduler achieves 8.4% runtime speedup on CPU2000int benchmarks. As an indicator of the advantage of applying hierarchical region structure, the technique that moves instruction across nested loops can obtain up to 12.9% runtime speedup. The integrated P-Ready instruction scheduling can obtain up to 7.6% speedup and 1.37% speedup on average for all CPU2000int benchmarks.

作者杨书鑫张兆庆

机构地区中国科学院计算技术研究所

出处《计算机学报》 EI CSCD 北大核心 2004年第5期577-586,共10页 Chinese Journal of Computers

基金国家"八六三"高技术研究发展计划项目基金 (2 0 0 1AA110 61) 国家自然科学基金 (6993 3 0 2 0 )资助

关键词 IA-64 全局指令调度 Itanium处理机 Bernstein算法 P—Ready 层次化区域编译器 Algorithms Computer architecture Program processors Scheduling

分类号 TP314 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献19

1Fisher J.A.. Trace scheduling: A technique for global microcode compaction. IEEE Transaction on Computers, 1981, 30(7): 478～490
2Hwu W.W.et al.. The superblock: An effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 1993, 7(1):229～248
3Mahlke S.A., Lin D.C., Chen W.Y., Hank R.E., Bringmann R.A.. Effective compiler support for predicated execution using the hyperblock. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Dregon, 1992, 45～54
4Bernstein D., Rodeh M.. Global instruction scheduling for superscalar machines. In: Proceedings of the SIGPLAN Annual Symposium, Toronto, Ontario, 1991, 1991, 241～255
5Bernstein D., Cohen D., Krawczyk H.. Code duplication: An assist for global instruction scheduling. In: Proceedings of the 24th Annual International Symposium Microarchitecture(MICRO24), Albuquerque, New Mexico, Puerto Rico, 1991,103～113
6Bharadwaj J., Menezes K., McKinsey C.. Wavefront scheduling: Path based data representation and scheduling of subgraphs. In: Proceedings of the International Symposium Microarchitecture (MICRO32), Haifa, 1999, 262～271
7Fisher J.A.. Global code generation for instruction-level parallelism: Trace scheduling-2, Hewlett-Packard Laboratories, Palo Alto, USA:Technical Report HPL-93-43, 1993
8Schlansker M.S., Rau M.S.. EPIC: Explicitly parallel instruction computing. IEEE Computer, 2000, 33(2):37～45
9ORC team, ORC suite. http://sourceforge.net/projects/ipf-orc, 2001～2004
10刘旸,张兆庆,乔如良.基于域的编译框架[J].计算机学报,2003,26(2):188-194. 被引量：5

二级参考文献9

1Fisher J. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981,(7): 478～490
2Hwu W et al. The superblock: An effective structure for VLIW and superscalar compilation. Journal of Supercomputing, 1993,7:229～248
3Havanki W A. Treegion scheduling for VLIW processors[MS dissertation]. Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 1997
4Hank R E. Region based compilation[Ph.D. dissertation]. University of Illinois, Urbana Champaign,1996
5Gupta R, Soffa M L, Ombres D. Efficient register allocation via coloring using cluque separators. ACM Trans Programming Languages and Systems, 1994,16:370～386
6Hank R E, Hwu W W, Rau B R. Region based compilation:Introduction, motivation and initial experience. International Journal of Parallel Programming, 1997,25(2):113～146
7Mahlke S A, Liu D C, Chen W Y et al. Effective compiler support for predicted execution using the hyperblock. In: Proceedings of the 25th International Symposium of Microarchitecture, Paris, 1999. 45～54
8Aho A, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools, Reading. MA: Addison-Wesley, 1986
9Gupta M, Soffa M L. Region scheduling. IEEE Transactions on Software Engineering, 1990,16: 421～431

共引文献4

1刘旸,张兆庆.优化RSE开销的过程间栈寄存器分配[J].计算机学报,2004,27(9):1198-1206.
2薛丽萍,张兆庆.ORC的代码生成的关键技术[J].计算机科学,2004,31(8):159-163.
3胡燕,龚育昌.可重定向编译器生成环境研究[J].系统工程与电子技术,2008,30(2):366-370.
4樊永朝,郑启龙,耿锐,王向前,王昊.BWDSP10x上地址和数据谓词执行的编译优化[J].计算机系统应用,2016,25(12):92-99.

1吴承勇,连瑞琦,张兆庆,乔如良.协作式全局指令调度与寄存器分配[J].计算机学报,2000,23(5):493-499. 被引量：3
2王昊,王向前.一种平衡的全局指令调度新框架研究[J].中国集成电路,2014,23(12):85-88.
3杨书鑫,张兆庆.全局指令调度综述[J].计算机工程与应用,2004,40(21):44-48. 被引量：1
4杨书鑫,薛丽萍,张兆庆.迭代式全局指令调度[J].计算机科学,2004,31(7):118-122.
5傅兴钢,李三立.一个RISC流水结构机器QHRC上的编译时指令调度器[J].计算机学报,1992,15(9):662-669.
6邹波.cookie思想在TCP与SCTP中的应用[J].电脑知识与技术,2006,1(4):95-96. 被引量：1
7张家明,陆平,齐进,周伯鑫,宋文忠.基于闭包的3NF模式综合算法[J].计算机研究与发展,1998,35(7):605-609.
8陈学龙.RFID技术在实际中的应用[J].印刷技术,2006(20):41-41.
9刘静,陈基禄,李继荣,王振旗.超级流水线超标量处理机的性能评价[J].华北电力大学学报（自然科学版）,2003,30(4):66-69. 被引量：1
10数字[J].数字通信,2008,35(5):9-9.

计算机学报

2004年第5期

浏览历史

内容加载中请稍等...

ORC的全局指令调度技术

参考文献19

二级参考文献9

共引文献4

相关作者

相关机构

相关主题

浏览历史