优化RSE开销的过程间栈寄存器分配

Inter-Procedural Register Allocation for RSE Optimization

下载PDF

导出

摘要安腾处理器引入了硬件控制的寄存器栈 ,寄存器栈引擎能够自动地改变寄存器栈帧指针 ,对栈寄存器进行保存和恢复 ,从而有效地减少跨越过程调用时的寄存器值的保存和重新载入 .每个过程使用的栈寄存器数量可以通过alloc指令显式地指定 .通常的过程内寄存器分配方法给过程分配最大需要数量的栈寄存器 .但过多的栈寄存器使用会引起寄存器栈溢出 /载入 .如果频繁出现这样的寄存器栈溢出 /载入 ,将严重影响程序执行性能 .该文提出了一种创新的算法 ,能够有效地减少RSE代价 .该算法已经在开放源码编译器ORC中得到了实现 .实验表明 ,SpecINT2 0 0 0在使用该算法后性能普遍提高 ,perlbmk的性能提高了 14 % ,而crafty也有 3 .2 %的性能提高 . In Itanium&reg architecture, a hardware managed register stack is introduced, register stack engine (RSE) can change the register stack frame pointers and spill/fill registers automatically. This mechanism can reduce load/store operations of register across call sites efficiently. The number of stacked registers used by a procedure could be specified by alloc instruction explicitly. Traditional intra-procedural register allocation algorithm will allocate max stacked registers required by a procedure but no more than the total number of stack registers. But a high stack register pressure will lead to frequent register stack spill/fill. If this event happens frequently, the performance will be seriously harmed. This paper proposes an innovative algorithm, which could reduce the RSE cost efficiently. This algorithm is already implemented in ORC. Experimental results show that the performance is improved obviously when this algorithm is applied, especially for perlbmk, it has 14% performance improvement and crafty also has 3.2% performance improvement.

作者刘旸张兆庆

机构地区中国科学院计算技术研究所中国科学院研究生院

出处《计算机学报》 EI CSCD 北大核心 2004年第9期1198-1206,共9页 Chinese Journal of Computers

基金国家自然科学基金 (699330 2 0 ) 英特尔公司资助

关键词寄存器栈寄存器栈引擎寄存器栈溢出/载入 Optimization Resource allocation

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献11

1Chaitin G.. Register allocation and spilling via graph coloring. In: Proceedings of the SIGPLAN 82 Symposium on Compiler Construction, New York, 1982, 98～105
2Chow F.C., Hennessy J.L.. Register allocation by priority-based coloring. In: Proceedings of the SIGPLAN'84 Symposium on Compiler Construction, NewYork, 1984, 222～232
3Briggs P.. Register allocation via graph coloring[Ph.D. dissertation]. Rice University, Houston, Texas, USA, 1992
4Briggs P., Cooper K., Torczon L.. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems, 1994,16(3):428～455
5Lueh G., Gross T.. Call-cost directed register allocation. In: Proceedings of ACM SIGPLAN'97 Conference on Program Language Design and Implementation, Las Vegas, Nevada, 1997,296～307
6Steenkiste P.A., Henessy J.L.. A simple interprocedural register allocation algorithm and its effectiveness for LISP.Transactions on Programming Languages and Systems, 1989, 11(1): 1～30
7Wall D.W.. Global register allocation at link time. In: Proceedings of the SIGPLAN'86 Symposium on Compiler Construction, New York, 1986,264～275
8Intel IA-64 System Architecture, Intel Company, 2002
9Douillet A., Amaral J.N., Gao G.R.. Fine-grain stackedregister allocation for the Itanium architecture. In:Proceedings of the 15th Workshop on Languages and Compilers for Parallel Computing(LCPC), College Park, Maryland, 2002
10David R. et al.. Quantitative evaluation of the register stack engine and optimization for future itanium processor. In: Proceedings of the 6th Annual Workshop on Interaction between Compilers and Computer Architectures,Boston,Massachusetts, 2002, 57～67

二级参考文献9

1Fisher J. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 1981,(7): 478～490
2Hwu W et al. The superblock: An effective structure for VLIW and superscalar compilation. Journal of Supercomputing, 1993,7:229～248
3Havanki W A. Treegion scheduling for VLIW processors[MS dissertation]. Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, 1997
4Hank R E. Region based compilation[Ph.D. dissertation]. University of Illinois, Urbana Champaign,1996
5Gupta R, Soffa M L, Ombres D. Efficient register allocation via coloring using cluque separators. ACM Trans Programming Languages and Systems, 1994,16:370～386
6Hank R E, Hwu W W, Rau B R. Region based compilation:Introduction, motivation and initial experience. International Journal of Parallel Programming, 1997,25(2):113～146
7Mahlke S A, Liu D C, Chen W Y et al. Effective compiler support for predicted execution using the hyperblock. In: Proceedings of the 25th International Symposium of Microarchitecture, Paris, 1999. 45～54
8Aho A, Sethi R, Ullman J. Compilers: Principles, Techniques, and Tools, Reading. MA: Addison-Wesley, 1986
9Gupta M, Soffa M L. Region scheduling. IEEE Transactions on Software Engineering, 1990,16: 421～431

共引文献4

1杨书鑫,张兆庆.ORC的全局指令调度技术[J].计算机学报,2004,27(5):577-586.
2薛丽萍,张兆庆.ORC的代码生成的关键技术[J].计算机科学,2004,31(8):159-163.
3胡燕,龚育昌.可重定向编译器生成环境研究[J].系统工程与电子技术,2008,30(2):366-370.
4樊永朝,郑启龙,耿锐,王向前,王昊.BWDSP10x上地址和数据谓词执行的编译优化[J].计算机系统应用,2016,25(12):92-99.

1邓晴莺,张民选.基于映射表的寄存器文件设计以及编译器优化[J].电子学报,2008,36(2):392-396. 被引量：1
2邓晴莺,张民选,蒋江.IA-64的并行架构及其寄存器文件[J].计算机工程,2008,34(12):13-15. 被引量：1

计算机学报

2004年第9期

浏览历史

内容加载中请稍等...

优化RSE开销的过程间栈寄存器分配

参考文献11

二级参考文献9

共引文献4

相关作者

相关机构

相关主题

浏览历史