摘要
安腾 处理器引入了硬件控制的寄存器栈 ,寄存器栈引擎能够自动地改变寄存器栈帧指针 ,对栈寄存器进行保存和恢复 ,从而有效地减少跨越过程调用时的寄存器值的保存和重新载入 .每个过程使用的栈寄存器数量可以通过alloc指令显式地指定 .通常的过程内寄存器分配方法给过程分配最大需要数量的栈寄存器 .但过多的栈寄存器使用会引起寄存器栈溢出 /载入 .如果频繁出现这样的寄存器栈溢出 /载入 ,将严重影响程序执行性能 .该文提出了一种创新的算法 ,能够有效地减少RSE代价 .该算法已经在开放源码编译器ORC中得到了实现 .实验表明 ,SpecINT2 0 0 0在使用该算法后性能普遍提高 ,perlbmk的性能提高了 14 % ,而crafty也有 3 .2 %的性能提高 .
In Itanium® architecture, a hardware managed register stack is introduced, register stack engine (RSE) can change the register stack frame pointers and spill/fill registers automatically. This mechanism can reduce load/store operations of register across call sites efficiently. The number of stacked registers used by a procedure could be specified by alloc instruction explicitly. Traditional intra-procedural register allocation algorithm will allocate max stacked registers required by a procedure but no more than the total number of stack registers. But a high stack register pressure will lead to frequent register stack spill/fill. If this event happens frequently, the performance will be seriously harmed. This paper proposes an innovative algorithm, which could reduce the RSE cost efficiently. This algorithm is already implemented in ORC. Experimental results show that the performance is improved obviously when this algorithm is applied, especially for perlbmk, it has 14% performance improvement and crafty also has 3.2% performance improvement.
出处
《计算机学报》
EI
CSCD
北大核心
2004年第9期1198-1206,共9页
Chinese Journal of Computers
基金
国家自然科学基金 (699330 2 0 )
英特尔公司资助