期刊文献+

基于自适应门控时钟的CPU功耗优化和VLSI设计 被引量:3

Power optimization and VLSI design of CPU based on adaptive clock-gating
下载PDF
导出
摘要 提出了一种CPU的功耗优化方法,即通过自适应时钟门控来解决CPU中由于流水线阻塞、浮点处理器(FPU)和多媒体协处理器空闲所导致的动态功耗浪费.首先,设计了模块级自适应时钟门控单元,并通过芯片内部硬件电路来自动监测上述模块是否空闲,模块空闲时时钟关闭,从而消除了不需要的时钟翻转带来的模块内部动态功耗消耗.然后,将自适应时钟门控单元应用于国产处理器Unicore-2中,对其流水线阻塞、FPU和多媒体协处理器空闲的产生进行功耗优化.最后,基于TSMC 65 nm工艺下已流片芯片的网表和寄生参数文件,通过反标芯片的波形获得电路翻转率,并用Prime Time PX工具进行了功耗仿真.仿真结果表明,利用本方法运行Dhrystone,Whestone和Stream三个典型测试程序时可获得18%-28%的功耗收益,其面积代价可以忽略,并对CPU性能没有影响. A power optimization method of embedded processors based on self-adaptive clock gating is proposed,which can reduce the power waste caused by pipeline stall,FPU( float point unit) idle and multimedia co-processor idle. First,an adaptive module level clock-gating cell is designed,which can detect automatically whether the status of each module is idle through on-chip hardw are.When the module is idle,its clock is turned off to save the dynamic power caused by unneeded clock toggling. Then,the adaptive clock-gating cell is applied to a domestic CPU( central processing unit)Unicore-2,and the power caused by pipeline stall,FPU and multimedia co-processor idle is optimized. Finally,based on the netlist and parasitic files of the previously fabricated TSMC 65 nm chip,the chip waveform is annotated to obtain the nets' toggle rates,and then the power simulations are performed by the Prime Time PX tool. The results show that an average of 18% to 28% power reduction can be obtained under typical test benchmarks of Dhrystone,Whestone and Stream,with negligible area overhead and no impact on CPU performance.
出处 《东南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2015年第2期219-223,共5页 Journal of Southeast University:Natural Science Edition
基金 江苏省"青蓝工程"资助项目
关键词 低功耗 自适应时钟门控 流水线阻塞 low power adaptive clock-gating pipeline stall
  • 相关文献

参考文献9

  • 1Gonzalez R, Horowitz M. Energy dissipation in general purpose microprocessors [J]. IEEE Journal of Solid-State Circuits, 1996, 31(9): 1277-1284.
  • 2Lotfi-Kamran P, Salehpour A A, Rahmani A M, et al. Dynamic power reduction of stalls in pipelined architecture processors[J]. International Journal of Design, Analysis & Tools for Integrated Circuits & Systems, 2011, 1(1):9-4.
  • 3Choi K, Soma R, Pedram M. Dynamic voltage and frequency scaling based on workload decomposition[C]//ACM International Symposium on Low Power Electronics and Design. Newport Beach, CA, USA, 2004: 174-179.
  • 4Jain S, Khare S, Yada S, et al. A 280 mV-to-1.2 V wide-operating-range IA-32 processor in 32 nm CMOS[C]//IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco, CA,USA, 2012: 66-68.
  • 5Chang X, Zhang M, Zhang G, et al. Adaptive clock gating technique for low power IP core in SoC design [C]//IEEE International Symposium on Circuits and Systems. New Orleans, LA, USA, 2007: 2120-2123.
  • 6Simon Tyler A, Ward William A, Boss Alan P. Performance analysis of Intel multiprocessors using astrophysics simulations [J]. Concurrency and Computation: Practice and Experience, 2012,24(2): 155-166.
  • 7Padua David. Encyclopedia of parallel computing [M]. New York: Springer-Verlag, 2011: 127-129.
  • 8Carazo P, Apolloni R, Castro F, et al. L1 data Cache power reduction using a forwarding predictor [J]. Lecture Notes on Computer Science, 2011, 6448: 116-125.
  • 9Miller M, Janik K, Lu S L. Non-stalling counterflow microarchitecture [C]//4th International Symposium on High Performance Computer Architecture. Las Vegas, Nevada, USA, 1998: 120-126.

同被引文献12

引证文献3

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部