期刊文献+

一种面向超标量处理器的低功耗指令Cache设计

A Low-Power Instruction Cache Design for Superscalar Microprocessors
下载PDF
导出
摘要 针对超标量结构中多体并行的流水化指令Cache提出了三种低功耗优化策略,首先是基于Cache路的条件放大技术,它根据标志匹配结果来关闭无关路中敏感放大器对存储阵列的驱动输出;其次是基于Cache行的动态电压调节技术,它只对当前访问的Cache行提供正常的操作电压,而其他Cache行都处于低电压休眠状态;最后是基于短循环程序的指令回收技术,它通过重复利用过期指令来减少对Cache的冗余访问.实验表明,这个低功耗设计在SPEC和PowerStone基准程序下可以将指令Cache的总功耗分别降低72.4%和84.3%,而处理器的IPC损失分别只有1.1%和0.8%,并且不会带来任何时序开销. To reduce the power consumption, three optimization strategies are proposed for the multi-hank and pipelined instruction cache in superscalar. The first technique is conditional amplifying based on cache way, which avoids sense amplifiers driving data from memory arrays in irrelated ways. The second one is dynamic voltage sealing based on cache line, which provides the normal operation voltage just for the active cache line and keeps all the other cache lines drowsy in a lower voltage. The last strategy is instruction recycling based on short loop program, which reuses ancient instructions to prevent redundant cache access. Experimental results show that this design methodology can reduce the total power of instruction cache by 72. 4% and 84. 3% respectively in SPEC and PowerStone benchmarks, and bring processor IPC loss by only 1.1% and 0. 8% respectively, without any timing overhead.
出处 《微电子学与计算机》 CSCD 北大核心 2015年第7期103-106,111,共5页 Microelectronics & Computer
基金 国家"八六三"计划项目(2011AA120204) "十二五"民用航天某预研项目(YY2011-012(D020201))
关键词 超标量 流水化指令Cache 条件放大 动态电压调节 指令回收 superscalar, pipelined instruction cache conditional amplifying dynamic voltage scaling instruction re-cycling
  • 相关文献

参考文献8

  • 1Hironaka T, Maeda M, Tanigawa K, et al. Supersca- lar processor with multi-bank register file[C]//Pro- ceedings of the Innovative Architecture for Future Generation High-Performance Processors and Sys terns. New York.. IEEE, 2005.. 3-12.
  • 2Zhang H, Fan D. Simplified multi-ported cache in high performance processor[C]///Proceedings of the Inter- national Conference on Networking, Architecture, and Storage. New York: IEEE, 2007: 9-14.
  • 3孙含欣,王箫音,佟冬,程旭.一种降低流水化指令缓冲存储器泄漏功耗的设计方法(英文)[J].北京大学学报(自然科学版),2008,44(1):55-61. 被引量:1
  • 4Jones T M, Bartolini S, De Bus B, et al. Instruction cache energy saving through compiler way-placement I-C~// Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. New York: IEEE, 2008: 1196-1201.
  • 5Kin J, Gupta M, Man4gione-Smith W H. Filtering memo- ry references to increase energy efficiency I-J]. IEEE Transactions on Computers, 2000,49 (1) : 1-15.
  • 6孟建熠,严晓浪,葛海通,徐鸿明.基于指令回收的低功耗循环分支折合技术[J].浙江大学学报(工学版),2010,44(4):632-638. 被引量:4
  • 7Zhang Y, Parikh D, Sankaranarayanan K, et al. Hot- Leakage.. A temperature-aware model of subthresholdand gate leakage for architects [EB/OL]. [2014-09- 20]. http..//lava, cs. virginia, edu/HotLeakage.
  • 8Shivakurnar P, Jouppi N P. CACTI 3. 0.. An integrated cache timing, power, and area model [EB/OL]. [2014- 09-20]. http..//www, hpl. hp. corn/research/cacti/.

二级参考文献30

  • 1ZMILY A, KOZYRAKIS C. Simultaneously improving code size, performance, and energy in embedded processors [ C ] // Proceedings of the Conference on Design Automation and Test in Europe. Munich: European Design and Automation Association, 2006: 224- 229.
  • 2EMMA P G, DAVIDSON E S. Characterization of branch and data dependencies in programs for evaluating pipeline performance [J]. IEEE Transactions on Computers, 1987, 36(7): 859- 875.
  • 3HEYDEMANN K, BODIN F, KNIJNENBURG P M W, et al. UFS: a global trade-off strategy for loop unrolling for VLIW architectures [C]// 10th International Workshop on Compilers for Parallel Computers. Chichester: John Wiley & Sons, 2006: 1413 - 1434.
  • 4亨尼西,帕特森.计算机体系结构:量化研究方法[M].3版.北京:机械工业出版社,2002:196-206.
  • 5BELLAS N, HAJJ I, POLYCHRONOPOULOS C, et al. Energy and performance improvements in microprocessor design using a loop cache [C]// IEEE International Conference on Computer Design. Austin: IEEE, 1999 :378 - 383.
  • 6DITZEL D R, MCLELLAN H R. Branch folding in the CRISP microprocessor reducing branch delay to zero [C] // Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh: ACM, 1987:2 - 8.
  • 7LEA H L, SCOTT J, MOYER B, et al. Low-cost branch folding for embedded applications with small tight loops [C]// 32nd Annual International Symposium on Microarchitecture. Haifa: IEEE, 1999: 103- 111.
  • 8MALIK A, MOYER B, CERMAK D. A low power unified cache architecture providing power and performance flexibility [C]// International Symposium on Low Power Electronics and Design. Rapallo : ACM, 2000 : 241 - 243.
  • 9PARK S H, YU S, CHO J W. Speculative branch folding for pipelined processors [J]. IEICE-Transactions on Information and Systems, 2005, 88(5): 1064 - 1066.
  • 10C-SKY MicroSystems. 32-bit high performance and low power embedded processor [EB/OL]. [2003-08]. http://www. c-sky. com.

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部