Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous ...Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.展开更多
安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外...安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外在系统默认的调控策略下,忽视了图形处理器(graphics processing unit, GPU)对浏览器运行的作用。针对上述问题,提出一种协同调控CPU和GPU实现功耗优化的方法。首先根据网页加载时处理器运行特征利用逻辑回归对网页进行分类,对网页特征加权实现复杂度量化,根据类别与复杂度采用DVFS技术限制CPU频率的同时调节GPU频率。该方法被应用于谷歌Pixel2 XL上的Chromium浏览器,对排名前500的中文网站进行测试,平均节省了12%功耗的同时减少了5%网页加载时间。展开更多
As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scalin...As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scaling (DFS) and history based Frequency Scaling (HDFS) algorithms are utilized to process the energy constrained data traffic. However, these conventional algorithms achieve higher energy efficiencies, and they result in performance degradation due to the auxiliary latency between clock domains. In this paper, we present a variable power optimization interface for NoC using a Finite State Machine (FSM) approach to attain better performance improvement. The parameters are estimated using 45 nm TSMCCMOS technology. In comparison with DFS system, the evaluation results show that FSM-DFS link achieves 81.55% dynamic power savings on the links in the on-chip network, and 37.5% leakage power savings of the link. Also, this proposed work is evaluated for various performance parameters and compared with conventional work. The simulation results are superior to conventional work.展开更多
To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC alg...To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.展开更多
研究芯片功耗中动态功耗部分,针对传统动态节能技术动态电压与频率调节(dynamic voltage and frequency scaling,DVFS)技术未能考虑预测CPU未来阶段行为的不足,提出BP-DVFS节能策略。为了提高下一阶段CPU利用率的预测准确性,更准确地对...研究芯片功耗中动态功耗部分,针对传统动态节能技术动态电压与频率调节(dynamic voltage and frequency scaling,DVFS)技术未能考虑预测CPU未来阶段行为的不足,提出BP-DVFS节能策略。为了提高下一阶段CPU利用率的预测准确性,更准确地对CPU进行动态调频进而降低其运行功耗。构建了一种FPU-CPU(forward predict utilization CPU)模型。模型假设下一时间段CPU利用率与CPU运行资源有关的事件特征量存在非线性函数关系,从处理器运行时环境出发提取出与CPU资源紧密相关的5个特征量进行度量,采用BP神经网络进行拟合训练。用训练后得到的神经网络预测CPU下一阶段的利用率,进行CPU处理不同类型任务程序的功耗仿真实验。并在相同实验条件下与常用的3种CPU调频策略实验结果进行对比。实验结果表明,在CPU处理不同类型任务程序时,采用BP-DVFS策略进行调频的CPU功耗都低于其他3种策略进行调频的CPU功耗。通过实验验证,本文提出的方法提高了预测CPU利用率的准确度,降低了CPU运行时功耗。同时验证了假设的合理性与有效性以及此方法实现CPU低功耗运行是有效的。展开更多
文摘Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.
文摘安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外在系统默认的调控策略下,忽视了图形处理器(graphics processing unit, GPU)对浏览器运行的作用。针对上述问题,提出一种协同调控CPU和GPU实现功耗优化的方法。首先根据网页加载时处理器运行特征利用逻辑回归对网页进行分类,对网页特征加权实现复杂度量化,根据类别与复杂度采用DVFS技术限制CPU频率的同时调节GPU频率。该方法被应用于谷歌Pixel2 XL上的Chromium浏览器,对排名前500的中文网站进行测试,平均节省了12%功耗的同时减少了5%网页加载时间。
文摘As low power consumption is the main design issue involved in a network on chip (NoC), researchers are concentrating more on both algorithms and architectural approaches. The conventional Dynamic Frequency Scaling (DFS) and history based Frequency Scaling (HDFS) algorithms are utilized to process the energy constrained data traffic. However, these conventional algorithms achieve higher energy efficiencies, and they result in performance degradation due to the auxiliary latency between clock domains. In this paper, we present a variable power optimization interface for NoC using a Finite State Machine (FSM) approach to attain better performance improvement. The parameters are estimated using 45 nm TSMCCMOS technology. In comparison with DFS system, the evaluation results show that FSM-DFS link achieves 81.55% dynamic power savings on the links in the on-chip network, and 37.5% leakage power savings of the link. Also, this proposed work is evaluated for various performance parameters and compared with conventional work. The simulation results are superior to conventional work.
基金the National Key Research and Development Program of China(2019YFB1803600)the Key Scientific Research Program of Shaanxi Provincial Department of Education(22JY059)the China Civil Aviation Airworthiness Center Open Foundation(SH2021111903)。
文摘To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.
文摘研究芯片功耗中动态功耗部分,针对传统动态节能技术动态电压与频率调节(dynamic voltage and frequency scaling,DVFS)技术未能考虑预测CPU未来阶段行为的不足,提出BP-DVFS节能策略。为了提高下一阶段CPU利用率的预测准确性,更准确地对CPU进行动态调频进而降低其运行功耗。构建了一种FPU-CPU(forward predict utilization CPU)模型。模型假设下一时间段CPU利用率与CPU运行资源有关的事件特征量存在非线性函数关系,从处理器运行时环境出发提取出与CPU资源紧密相关的5个特征量进行度量,采用BP神经网络进行拟合训练。用训练后得到的神经网络预测CPU下一阶段的利用率,进行CPU处理不同类型任务程序的功耗仿真实验。并在相同实验条件下与常用的3种CPU调频策略实验结果进行对比。实验结果表明,在CPU处理不同类型任务程序时,采用BP-DVFS策略进行调频的CPU功耗都低于其他3种策略进行调频的CPU功耗。通过实验验证,本文提出的方法提高了预测CPU利用率的准确度,降低了CPU运行时功耗。同时验证了假设的合理性与有效性以及此方法实现CPU低功耗运行是有效的。