期刊文献+

ORC的全局指令调度技术

Global Instruction Scheduling Technique in ORC
下载PDF
导出
摘要 IA 6 4是一种崭新的体系结构 ,它为挖掘程序中潜在的指令级并行提供了丰富的硬件支持 ,例如 :大寄存器组、(控制 /数据 )投机、谓词等 .Itanium是IA 6 4的一个具体实现 .该文作者将Bernstein的基于超标量处理机的全局指令调度算法应用于显式并行 (EPIC)的Itanium处理机上 .在结合Itanium处理机特性的同时 ,作者对Bernstein的算法有以下两点创新 :(1)应用层次化区域 .相对于传统的扁平区域 ,这样的区域具有很强的灵活性并提供了调度器大小合适的调度范围 ,使其既能充分利用硬件资源又能够有效地控制调度的时间和空间开销 .(2 )集成P Ready指令调度 .P Ready是在与Bernstein算法框架差异很大的上下文中提出的 .P Ready指令调度能够把优先级高的指令尽早调度即使这条指令并没有在所有经过它的执行路径上解除数据依赖 .集成P Ready指令调度到Bernstein的算法框架上是十分有意义的 .作者在“基于Itanium处理机的开放源码编译器ORC”中实现了该文介绍的算法 ,实验结果显示全局指令调度器对CPU2 0 0 0int基准测试例平均有 8.4 %的运行时加速比 .作为应用层次化区域的优越性的一个反映 ,调度指令跨越嵌套循环最高可取得 12 .9%的运行时加速比 .此外 ,P Ready指令调度对CPU2 0 0 0int的测试例平均有 1.37%的运行? IA-64 is a novel architecture that provides ample hardware support to exploit instruction level parallelism. And it is exemplified by the Itanium processor. Authors adapt D. Bernstein's algorithm, which is targeted for superscalar processor, to EPIC processor Itanium. And authors take full advantage of Itanium specific details, and have two improvements to D. Bernstein's algorithm: (1) Apply hierarchical region structure into this algorithm. (2) Integrate P-Ready instruction scheduling. P-Ready is proposed under a context that is totally different with D. Bernstein's. The algorithm proposed is implemented in the open research compiler ORC. Authors' results show that the global scheduler achieves 8.4% runtime speedup on CPU2000int benchmarks. As an indicator of the advantage of applying hierarchical region structure, the technique that moves instruction across nested loops can obtain up to 12.9% runtime speedup. The integrated P-Ready instruction scheduling can obtain up to 7.6% speedup and 1.37% speedup on average for all CPU2000int benchmarks.
出处 《计算机学报》 EI CSCD 北大核心 2004年第5期577-586,共10页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划项目基金 (2 0 0 1AA110 61) 国家自然科学基金 (6993 3 0 2 0 )资助
关键词 IA-64 全局指令调度 Itanium处理机 Bernstein算法 P—Ready 层次化区域 编译器 Algorithms Computer architecture Program processors Scheduling
  • 相关文献

参考文献19

  • 1Fisher J.A.. Trace scheduling: A technique for global microcode compaction. IEEE Transaction on Computers, 1981, 30(7): 478~490
  • 2Hwu W.W.et al.. The superblock: An effective technique for VLIW and superscalar compilation. Journal of Supercomputing, 1993, 7(1):229~248
  • 3Mahlke S.A., Lin D.C., Chen W.Y., Hank R.E., Bringmann R.A.. Effective compiler support for predicated execution using the hyperblock. In: Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Dregon, 1992, 45~54
  • 4Bernstein D., Rodeh M.. Global instruction scheduling for superscalar machines. In: Proceedings of the SIGPLAN Annual Symposium, Toronto, Ontario, 1991, 1991, 241~255
  • 5Bernstein D., Cohen D., Krawczyk H.. Code duplication: An assist for global instruction scheduling. In: Proceedings of the 24th Annual International Symposium Microarchitecture(MICRO24), Albuquerque, New Mexico, Puerto Rico, 1991,103~113
  • 6Bharadwaj J., Menezes K., McKinsey C.. Wavefront scheduling: Path based data representation and scheduling of subgraphs. In: Proceedings of the International Symposium Microarchitecture (MICRO32), Haifa, 1999, 262~271
  • 7Fisher J.A.. Global code generation for instruction-level parallelism: Trace scheduling-2, Hewlett-Packard Laboratories, Palo Alto, USA:Technical Report HPL-93-43, 1993
  • 8Schlansker M.S., Rau M.S.. EPIC: Explicitly parallel instruction computing. IEEE Computer, 2000, 33(2):37~45
  • 9ORC team, ORC suite. http://sourceforge.net/projects/ipf-orc, 2001~2004
  • 10刘旸,张兆庆,乔如良.基于域的编译框架[J].计算机学报,2003,26(2):188-194. 被引量:5

二级参考文献9

  • 1Fisher J. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981,(7): 478~490
  • 2Hwu W et al. The superblock: An effective structure for VLIW and superscalar compilation. Journal of Supercomputing, 1993,7:229~248
  • 3Havanki W A. Treegion scheduling for VLIW processors[MS dissertation]. Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 1997
  • 4Hank R E. Region based compilation[Ph.D. dissertation]. University of Illinois, Urbana Champaign,1996
  • 5Gupta R, Soffa M L, Ombres D. Efficient register allocation via coloring using cluque separators. ACM Trans Programming Languages and Systems, 1994,16:370~386
  • 6Hank R E, Hwu W W, Rau B R. Region based compilation:Introduction, motivation and initial experience. International Journal of Parallel Programming, 1997,25(2):113~146
  • 7Mahlke S A, Liu D C, Chen W Y et al. Effective compiler support for predicted execution using the hyperblock. In: Proceedings of the 25th International Symposium of Microarchitecture, Paris, 1999. 45~54
  • 8Aho A, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools, Reading. MA: Addison-Wesley, 1986
  • 9Gupta M, Soffa M L. Region scheduling. IEEE Transactions on Software Engineering, 1990,16: 421~431

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部