Based on the fundamental relationship among the circuit power, the circuit delay and the supply voltage, four theorems associated with the application of dynamic voltage scaling (DVS) policies are proposed and prove...Based on the fundamental relationship among the circuit power, the circuit delay and the supply voltage, four theorems associated with the application of dynamic voltage scaling (DVS) policies are proposed and proved. First, the existence characteristics of the optimal supply voltage for a single task are proved, which suggests that the optimal supply voltage for the single task should be selected only within a one-dimensional term, and the corresponding task end time by the optimal supply voltage should be identical with its deadline. Then, it is pointed out that the minimum energy consumption that the DVS policy can obtain when completing a single task is certainly lower than that of the dynamic power management (DPM) policy or the combined DVS+DPM policy under the same conditions. Finally, the theorem of energy consumption minimization for a multi-task group is proposed, which declares that it is necessary to keep the processor in the execution state during the whole task period to obtain the minimum energy consumption, while satisfying the deadline constraints of any task.展开更多
安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外...安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外在系统默认的调控策略下,忽视了图形处理器(graphics processing unit, GPU)对浏览器运行的作用。针对上述问题,提出一种协同调控CPU和GPU实现功耗优化的方法。首先根据网页加载时处理器运行特征利用逻辑回归对网页进行分类,对网页特征加权实现复杂度量化,根据类别与复杂度采用DVFS技术限制CPU频率的同时调节GPU频率。该方法被应用于谷歌Pixel2 XL上的Chromium浏览器,对排名前500的中文网站进行测试,平均节省了12%功耗的同时减少了5%网页加载时间。展开更多
To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC alg...To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.展开更多
Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous ...Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.展开更多
Task offloading is an important concept for edge computing and the Internet of Things(IoT)because computationintensive tasksmust beoffloaded tomore resource-powerful remote devices.Taskoffloading has several advantage...Task offloading is an important concept for edge computing and the Internet of Things(IoT)because computationintensive tasksmust beoffloaded tomore resource-powerful remote devices.Taskoffloading has several advantages,including increased battery life,lower latency,and better application performance.A task offloading method determines whether sections of the full application should be run locally or offloaded for execution remotely.The offloading choice problem is influenced by several factors,including application properties,network conditions,hardware features,and mobility,influencing the offloading system’s operational environment.This study provides a thorough examination of current task offloading and resource allocation in edge computing,covering offloading strategies,algorithms,and factors that influence offloading.Full offloading and partial offloading strategies are the two types of offloading strategies.The algorithms for task offloading and resource allocation are then categorized into two parts:machine learning algorithms and non-machine learning algorithms.We examine and elaborate on algorithms like Supervised Learning,Unsupervised Learning,and Reinforcement Learning(RL)under machine learning.Under the non-machine learning algorithm,we elaborate on algorithms like non(convex)optimization,Lyapunov optimization,Game theory,Heuristic Algorithm,Dynamic Voltage Scaling,Gibbs Sampling,and Generalized Benders Decomposition(GBD).Finally,we highlight and discuss some research challenges and issues in edge computing.展开更多
文摘Based on the fundamental relationship among the circuit power, the circuit delay and the supply voltage, four theorems associated with the application of dynamic voltage scaling (DVS) policies are proposed and proved. First, the existence characteristics of the optimal supply voltage for a single task are proved, which suggests that the optimal supply voltage for the single task should be selected only within a one-dimensional term, and the corresponding task end time by the optimal supply voltage should be identical with its deadline. Then, it is pointed out that the minimum energy consumption that the DVS policy can obtain when completing a single task is certainly lower than that of the dynamic power management (DPM) policy or the combined DVS+DPM policy under the same conditions. Finally, the theorem of energy consumption minimization for a multi-task group is proposed, which declares that it is necessary to keep the processor in the execution state during the whole task period to obtain the minimum energy consumption, while satisfying the deadline constraints of any task.
文摘安卓系统为浏览器分配资源时无法感知网页内容,会导致资源过度分配和电量不必要损失。同时,由于CPU可调节频率密度的增长,通过动态电压频率缩放(dynamic voltage and frequency scaling, DVFS)技术实现能耗优化的难度也随之增大。另外在系统默认的调控策略下,忽视了图形处理器(graphics processing unit, GPU)对浏览器运行的作用。针对上述问题,提出一种协同调控CPU和GPU实现功耗优化的方法。首先根据网页加载时处理器运行特征利用逻辑回归对网页进行分类,对网页特征加权实现复杂度量化,根据类别与复杂度采用DVFS技术限制CPU频率的同时调节GPU频率。该方法被应用于谷歌Pixel2 XL上的Chromium浏览器,对排名前500的中文网站进行测试,平均节省了12%功耗的同时减少了5%网页加载时间。
基金the National Key Research and Development Program of China(2019YFB1803600)the Key Scientific Research Program of Shaanxi Provincial Department of Education(22JY059)the China Civil Aviation Airworthiness Center Open Foundation(SH2021111903)。
文摘To apply a quasi-cyclic low density parity check(QC-LDPC)to different scenarios,a data-stream driven pipelined macro instruction set and a reconfigurable processor architecture are proposed for the typical QC-LDPC algorithm.The data-level parallelism is improved by instructions to dynamically configure the multi-core computing units.Simultaneously,an intelligent adjustment strategy based on a programmable wake-up controller(WuC)is designed so that the computing mode,operating voltage,and frequency of the QC-LDPC algorithm can be adjusted.This adjustment can improve the computing efficiency of the processor.The QC-LDPC processors are verified on the Xilinx ZCU102 field programmable gate array(FPGA)board and the computing efficiency is measured.The experimental results indicate that the QC-LDPC processor can support two encoding lengths of three typical QC-LDPC algorithms and 20 adaptive operating modes of operating voltage and frequency.The maximum efficiency can reach up to 12.18 Gbit/(s·W),which is more flexible than existing state-of-the-art processors for QC-LDPC.
文摘Energy-proportional computing is one of the foremost constraints in the design of next generation exascale systems. These systems must have a very high FLOP-per-watt ratio to be sustainable, which requires tremendous improvements in power efficiency for modern computing systems. This paper focuses on the processor—as still the biggest contributor to the power usage—by considering both its core and uncore power subsystems. The uncore describes those processor functions that are not handled by the core, such as L3 cache and on-chip interconnect, and contributes significantly to the total system power. The uncore frequency scaling (UFS) capability has been available to the user since the Intel Haswell processor generation. In this paper, performance and power models are proposed to use both the UFS and dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption in parallel applications. Then, these models are incorporated into a runtime strategy that performs processor frequency scaling during parallel application execution. The strategy can be implemented at the kernel/firmware level, which makes it suitable for improving the energy efficiency of exascale design. Experiments on a 20-core Haswell-EP machine using the quantum chemistry application GAMESS and NAS benchmark resulted in up to 24% energy savings with as little as 2% performance loss.
基金supported by the National Natural Science Foundation of China(Grant No.61872002)Anhui Province Key Research and Development Program Project(Grant No.201904a05020091).
文摘Task offloading is an important concept for edge computing and the Internet of Things(IoT)because computationintensive tasksmust beoffloaded tomore resource-powerful remote devices.Taskoffloading has several advantages,including increased battery life,lower latency,and better application performance.A task offloading method determines whether sections of the full application should be run locally or offloaded for execution remotely.The offloading choice problem is influenced by several factors,including application properties,network conditions,hardware features,and mobility,influencing the offloading system’s operational environment.This study provides a thorough examination of current task offloading and resource allocation in edge computing,covering offloading strategies,algorithms,and factors that influence offloading.Full offloading and partial offloading strategies are the two types of offloading strategies.The algorithms for task offloading and resource allocation are then categorized into two parts:machine learning algorithms and non-machine learning algorithms.We examine and elaborate on algorithms like Supervised Learning,Unsupervised Learning,and Reinforcement Learning(RL)under machine learning.Under the non-machine learning algorithm,we elaborate on algorithms like non(convex)optimization,Lyapunov optimization,Game theory,Heuristic Algorithm,Dynamic Voltage Scaling,Gibbs Sampling,and Generalized Benders Decomposition(GBD).Finally,we highlight and discuss some research challenges and issues in edge computing.