摘要
提出了一种CPU的功耗优化方法,即通过自适应时钟门控来解决CPU中由于流水线阻塞、浮点处理器(FPU)和多媒体协处理器空闲所导致的动态功耗浪费.首先,设计了模块级自适应时钟门控单元,并通过芯片内部硬件电路来自动监测上述模块是否空闲,模块空闲时时钟关闭,从而消除了不需要的时钟翻转带来的模块内部动态功耗消耗.然后,将自适应时钟门控单元应用于国产处理器Unicore-2中,对其流水线阻塞、FPU和多媒体协处理器空闲的产生进行功耗优化.最后,基于TSMC 65 nm工艺下已流片芯片的网表和寄生参数文件,通过反标芯片的波形获得电路翻转率,并用Prime Time PX工具进行了功耗仿真.仿真结果表明,利用本方法运行Dhrystone,Whestone和Stream三个典型测试程序时可获得18%-28%的功耗收益,其面积代价可以忽略,并对CPU性能没有影响.
A power optimization method of embedded processors based on self-adaptive clock gating is proposed,which can reduce the power waste caused by pipeline stall,FPU( float point unit) idle and multimedia co-processor idle. First,an adaptive module level clock-gating cell is designed,which can detect automatically whether the status of each module is idle through on-chip hardw are.When the module is idle,its clock is turned off to save the dynamic power caused by unneeded clock toggling. Then,the adaptive clock-gating cell is applied to a domestic CPU( central processing unit)Unicore-2,and the power caused by pipeline stall,FPU and multimedia co-processor idle is optimized. Finally,based on the netlist and parasitic files of the previously fabricated TSMC 65 nm chip,the chip waveform is annotated to obtain the nets' toggle rates,and then the power simulations are performed by the Prime Time PX tool. The results show that an average of 18% to 28% power reduction can be obtained under typical test benchmarks of Dhrystone,Whestone and Stream,with negligible area overhead and no impact on CPU performance.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第2期219-223,共5页
Journal of Southeast University:Natural Science Edition
基金
江苏省"青蓝工程"资助项目