一种面向超标量处理器的低功耗指令Cache设计

A Low-Power Instruction Cache Design for Superscalar Microprocessors

下载PDF

导出

摘要针对超标量结构中多体并行的流水化指令Cache提出了三种低功耗优化策略,首先是基于Cache路的条件放大技术,它根据标志匹配结果来关闭无关路中敏感放大器对存储阵列的驱动输出;其次是基于Cache行的动态电压调节技术,它只对当前访问的Cache行提供正常的操作电压,而其他Cache行都处于低电压休眠状态;最后是基于短循环程序的指令回收技术,它通过重复利用过期指令来减少对Cache的冗余访问.实验表明,这个低功耗设计在SPEC和PowerStone基准程序下可以将指令Cache的总功耗分别降低72.4%和84.3%,而处理器的IPC损失分别只有1.1%和0.8%,并且不会带来任何时序开销. To reduce the power consumption, three optimization strategies are proposed for the multi-hank and pipelined instruction cache in superscalar. The first technique is conditional amplifying based on cache way, which avoids sense amplifiers driving data from memory arrays in irrelated ways. The second one is dynamic voltage sealing based on cache line, which provides the normal operation voltage just for the active cache line and keeps all the other cache lines drowsy in a lower voltage. The last strategy is instruction recycling based on short loop program, which reuses ancient instructions to prevent redundant cache access. Experimental results show that this design methodology can reduce the total power of instruction cache by 72. 4% and 84. 3% respectively in SPEC and PowerStone benchmarks, and bring processor IPC loss by only 1.1% and 0. 8% respectively, without any timing overhead.

作者肖建青李伟张洵颖沈绪榜

机构地区西安微电子技术研究所

出处《微电子学与计算机》 CSCD 北大核心 2015年第7期103-106,111,共5页 Microelectronics & Computer

基金国家"八六三"计划项目(2011AA120204) "十二五"民用航天某预研项目(YY2011-012(D020201))

关键词超标量流水化指令Cache 条件放大动态电压调节指令回收 superscalar, pipelined instruction cache conditional amplifying dynamic voltage scaling instruction re-cycling

分类号 TP302.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献8

1Hironaka T, Maeda M, Tanigawa K, et al. Supersca- lar processor with multi-bank register file[C]//Pro- ceedings of the Innovative Architecture for Future Generation High-Performance Processors and Sys terns. New York.. IEEE, 2005.. 3-12.
2Zhang H, Fan D. Simplified multi-ported cache in high performance processor[C]///Proceedings of the Inter- national Conference on Networking, Architecture, and Storage. New York: IEEE, 2007: 9-14.
3孙含欣,王箫音,佟冬,程旭.一种降低流水化指令缓冲存储器泄漏功耗的设计方法(英文)[J].北京大学学报（自然科学版）,2008,44(1):55-61. 被引量：1
4Jones T M, Bartolini S, De Bus B, et al. Instruction cache energy saving through compiler way-placement I-C~// Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. New York: IEEE, 2008: 1196-1201.
5Kin J, Gupta M, Man4gione-Smith W H. Filtering memo- ry references to increase energy efficiency I-J]. IEEE Transactions on Computers, 2000,49 (1) : 1-15.
6孟建熠,严晓浪,葛海通,徐鸿明.基于指令回收的低功耗循环分支折合技术[J].浙江大学学报（工学版）,2010,44(4):632-638. 被引量：4
7Zhang Y, Parikh D, Sankaranarayanan K, et al. Hot- Leakage.. A temperature-aware model of subthresholdand gate leakage for architects [EB/OL]. [2014-09- 20]. http..//lava, cs. virginia, edu/HotLeakage.
8Shivakurnar P, Jouppi N P. CACTI 3. 0.. An integrated cache timing, power, and area model [EB/OL]. [2014- 09-20]. http..//www, hpl. hp. corn/research/cacti/.

二级参考文献30

1ZMILY A, KOZYRAKIS C. Simultaneously improving code size, performance, and energy in embedded processors [ C ] // Proceedings of the Conference on Design Automation and Test in Europe. Munich: European Design and Automation Association, 2006: 224- 229.
2EMMA P G, DAVIDSON E S. Characterization of branch and data dependencies in programs for evaluating pipeline performance [J]. IEEE Transactions on Computers, 1987, 36(7): 859- 875.
3HEYDEMANN K, BODIN F, KNIJNENBURG P M W, et al. UFS: a global trade-off strategy for loop unrolling for VLIW architectures [C]// 10th International Workshop on Compilers for Parallel Computers. Chichester: John Wiley & Sons, 2006: 1413 - 1434.
4亨尼西,帕特森.计算机体系结构:量化研究方法[M].3版.北京:机械工业出版社,2002:196-206.
5BELLAS N, HAJJ I, POLYCHRONOPOULOS C, et al. Energy and performance improvements in microprocessor design using a loop cache [C]// IEEE International Conference on Computer Design. Austin: IEEE, 1999 :378 - 383.
6DITZEL D R, MCLELLAN H R. Branch folding in the CRISP microprocessor reducing branch delay to zero [C] // Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh: ACM, 1987:2 - 8.
7LEA H L, SCOTT J, MOYER B, et al. Low-cost branch folding for embedded applications with small tight loops [C]// 32nd Annual International Symposium on Microarchitecture. Haifa: IEEE, 1999: 103- 111.
8MALIK A, MOYER B, CERMAK D. A low power unified cache architecture providing power and performance flexibility [C]// International Symposium on Low Power Electronics and Design. Rapallo : ACM, 2000 : 241 - 243.
9PARK S H, YU S, CHO J W. Speculative branch folding for pipelined processors [J]. IEICE-Transactions on Information and Systems, 2005, 88(5): 1064 - 1066.
10C-SKY MicroSystems. 32-bit high performance and low power embedded processor [EB/OL]. [2003-08]. http://www. c-sky. com.

共引文献3

1项晓燕,陈志坚,孟建熠,严晓浪.基于邻行链接访问的低功耗指令高速缓存[J].浙江大学学报（工学版）,2013,47(7):1213-1217. 被引量：1
2李伟,肖建青.基于流水化和滑动窗口结构的低功耗指令Cache设计[J].计算机工程与科学,2015,37(6):1037-1042.
3李泉泉,张铁军,王东辉,侯朝焕.基于分支执行历史的循环缓冲低功耗方法[J].微电子学与计算机,2014,31(9):7-10.

1侯孝民,李刚,欧宏武.超高速多体并行存储单元设计[J].指挥技术学院学报,1999,10(2):76-80. 被引量：1
2胡亦鸣,裴先登,李建国.高速多体并行存储系统电路设计[J].数据采集与处理,1993,8(4):310-315. 被引量：2
3杨名,于立新.基于标志压缩的低功耗指令cache设计[J].微电子学与计算机,2008,25(5):91-94. 被引量：4
4林钟官.Cyrix M1处理器体系结构分析[J].微电脑世界,1994(8):25-27.
5陆金江.论JAVA堆管理中的垃圾回收技术[J].电子技术与软件工程,2013(23):270-270.
6郑新建,田泽,张骏.一种低功耗指令Cache的设计与实现[J].微电子学与计算机,2015,32(7):25-28. 被引量：3
7白锋,程旭.一种针对短循环的跳转隐藏技术[J].计算机工程与应用,2003,39(22):70-71.
8李伟,肖建青.基于流水化和滑动窗口结构的低功耗指令Cache设计[J].计算机工程与科学,2015,37(6):1037-1042.
9吴迅.用单片机设计音乐播放器[J].电子世界,2009(6):44-44. 被引量：1
10蒋瑞挺.雕刻机三轴驱动系统[J].电子制作,2012,20(5):44-46. 被引量：1

微电子学与计算机

2015年第7期

浏览历史

内容加载中请稍等...

一种面向超标量处理器的低功耗指令Cache设计

参考文献8

二级参考文献30

共引文献3

相关作者

相关机构

相关主题

浏览历史