期刊文献+

优化RSE开销的过程间栈寄存器分配

Inter-Procedural Register Allocation for RSE Optimization
下载PDF
导出
摘要 安腾 处理器引入了硬件控制的寄存器栈 ,寄存器栈引擎能够自动地改变寄存器栈帧指针 ,对栈寄存器进行保存和恢复 ,从而有效地减少跨越过程调用时的寄存器值的保存和重新载入 .每个过程使用的栈寄存器数量可以通过alloc指令显式地指定 .通常的过程内寄存器分配方法给过程分配最大需要数量的栈寄存器 .但过多的栈寄存器使用会引起寄存器栈溢出 /载入 .如果频繁出现这样的寄存器栈溢出 /载入 ,将严重影响程序执行性能 .该文提出了一种创新的算法 ,能够有效地减少RSE代价 .该算法已经在开放源码编译器ORC中得到了实现 .实验表明 ,SpecINT2 0 0 0在使用该算法后性能普遍提高 ,perlbmk的性能提高了 14 % ,而crafty也有 3 .2 %的性能提高 . In Itanium&reg architecture, a hardware managed register stack is introduced, register stack engine (RSE) can change the register stack frame pointers and spill/fill registers automatically. This mechanism can reduce load/store operations of register across call sites efficiently. The number of stacked registers used by a procedure could be specified by alloc instruction explicitly. Traditional intra-procedural register allocation algorithm will allocate max stacked registers required by a procedure but no more than the total number of stack registers. But a high stack register pressure will lead to frequent register stack spill/fill. If this event happens frequently, the performance will be seriously harmed. This paper proposes an innovative algorithm, which could reduce the RSE cost efficiently. This algorithm is already implemented in ORC. Experimental results show that the performance is improved obviously when this algorithm is applied, especially for perlbmk, it has 14% performance improvement and crafty also has 3.2% performance improvement.
作者 刘旸 张兆庆
出处 《计算机学报》 EI CSCD 北大核心 2004年第9期1198-1206,共9页 Chinese Journal of Computers
基金 国家自然科学基金 (699330 2 0 ) 英特尔公司资助
关键词 寄存器栈 寄存器栈引擎 寄存器栈溢出/载入 Optimization Resource allocation
  • 相关文献

参考文献11

  • 1Chaitin G.. Register allocation and spilling via graph coloring. In: Proceedings of the SIGPLAN 82 Symposium on Compiler Construction, New York, 1982, 98~105
  • 2Chow F.C., Hennessy J.L.. Register allocation by priority-based coloring. In: Proceedings of the SIGPLAN'84 Symposium on Compiler Construction, NewYork, 1984, 222~232
  • 3Briggs P.. Register allocation via graph coloring[Ph.D. dissertation]. Rice University, Houston, Texas, USA, 1992
  • 4Briggs P., Cooper K., Torczon L.. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems, 1994,16(3):428~455
  • 5Lueh G., Gross T.. Call-cost directed register allocation. In: Proceedings of ACM SIGPLAN'97 Conference on Program Language Design and Implementation, Las Vegas, Nevada, 1997,296~307
  • 6Steenkiste P.A., Henessy J.L.. A simple interprocedural register allocation algorithm and its effectiveness for LISP.Transactions on Programming Languages and Systems, 1989, 11(1): 1~30
  • 7Wall D.W.. Global register allocation at link time. In: Proceedings of the SIGPLAN'86 Symposium on Compiler Construction, New York, 1986,264~275
  • 8Intel IA-64 System Architecture, Intel Company, 2002
  • 9Douillet A., Amaral J.N., Gao G.R.. Fine-grain stackedregister allocation for the Itanium architecture. In:Proceedings of the 15th Workshop on Languages and Compilers for Parallel Computing(LCPC), College Park, Maryland, 2002
  • 10David R. et al.. Quantitative evaluation of the register stack engine and optimization for future itanium processor. In: Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architectures,Boston,Massachusetts, 2002, 57~67

二级参考文献9

  • 1Fisher J. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981,(7): 478~490
  • 2Hwu W et al. The superblock: An effective structure for VLIW and superscalar compilation. Journal of Supercomputing, 1993,7:229~248
  • 3Havanki W A. Treegion scheduling for VLIW processors[MS dissertation]. Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 1997
  • 4Hank R E. Region based compilation[Ph.D. dissertation]. University of Illinois, Urbana Champaign,1996
  • 5Gupta R, Soffa M L, Ombres D. Efficient register allocation via coloring using cluque separators. ACM Trans Programming Languages and Systems, 1994,16:370~386
  • 6Hank R E, Hwu W W, Rau B R. Region based compilation:Introduction, motivation and initial experience. International Journal of Parallel Programming, 1997,25(2):113~146
  • 7Mahlke S A, Liu D C, Chen W Y et al. Effective compiler support for predicted execution using the hyperblock. In: Proceedings of the 25th International Symposium of Microarchitecture, Paris, 1999. 45~54
  • 8Aho A, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools, Reading. MA: Addison-Wesley, 1986
  • 9Gupta M, Soffa M L. Region scheduling. IEEE Transactions on Software Engineering, 1990,16: 421~431

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部