Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will una...Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling(DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications.This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient(CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark.Our experiments validate the effectiveness of our holistic energy-efficient model and technology.展开更多
Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge.This paper develops a novel resource allocation scheme for memory-bound applications runni...Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge.This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing(HPC)clusters,aiming to improve application performance without breaching peak power constraints and total energy consumption.Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance.It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so.We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge(HPCC)benchmark suites and evaluate it on a representative HPC cluster.Experimental results show that our approach can effectively mitigate memory contention to improve application performance,and it achieves this without significantly increasing the peak power and overall energy consumption.Our approach obtains on average 12.69%performance improvement over the default resource allocation strategy,but uses 7.06%less total power,which translates into 17.77%energy savings.展开更多
Exascale computing is one of the major challenges of this decade,and several studies have shown that communications are becoming one of the bottlenecks for scaling parallel applications.The analysis on the characteris...Exascale computing is one of the major challenges of this decade,and several studies have shown that communications are becoming one of the bottlenecks for scaling parallel applications.The analysis on the characteristics of communications can effectively aid to improve the performance of scientific applications.In this paper,we focus on the statistical regularity in time-dimension communication characteristics for representative scientific applications on supercomputer systems,and then prove that the distribution of communication-event intervals has a power-law decay,which is common in scientific interests and human activities.We verify the distribution of communication-event intervals has really a power-law decay on the Tianhe-2 supercomputer,and also on the other six parallel systems with three different network topologies and two routing policies.In order to do a quantitative study on the power-law distribution,we exploit two groups of statistics:bursty vs.memory and periodicity vs.dispersion.Our results indicate that the communication events show a“strong-bursty and weak-memory”characteristic and the communication event intervals show the periodicity and the dispersion.Finally,our research provides an insight into the relationship between communication optimizations and time-dimension communication characteristics.展开更多
基金the funding from the National Key Research and Development Program of China(No.2018YFB1003203)the Advanced Research Project of China(No.31511010203)+1 种基金Open Fund from State Key Laboratory of High Performance Computing(No.201503-02)Research Program of NUDT(No.ZK18-03-10)
文摘Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling(DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications.This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient(CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark.Our experiments validate the effectiveness of our holistic energy-efficient model and technology.
基金supported in part by the Advanced Research Project of China(No.31511010203)the Research Program of NUDT(No.ZK18-03-10)。
文摘Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge.This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing(HPC)clusters,aiming to improve application performance without breaching peak power constraints and total energy consumption.Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance.It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so.We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge(HPCC)benchmark suites and evaluate it on a representative HPC cluster.Experimental results show that our approach can effectively mitigate memory contention to improve application performance,and it achieves this without significantly increasing the peak power and overall energy consumption.Our approach obtains on average 12.69%performance improvement over the default resource allocation strategy,but uses 7.06%less total power,which translates into 17.77%energy savings.
基金funding from the National Key Research and Development Program of China(2017YFB0202200)the Advanced Research Project of China(31511010203)+1 种基金Open Fund(201503-02)from State Key Laboratory of High Performance Computing,and Research Program of NUDT(ZK18-03-10).
文摘Exascale computing is one of the major challenges of this decade,and several studies have shown that communications are becoming one of the bottlenecks for scaling parallel applications.The analysis on the characteristics of communications can effectively aid to improve the performance of scientific applications.In this paper,we focus on the statistical regularity in time-dimension communication characteristics for representative scientific applications on supercomputer systems,and then prove that the distribution of communication-event intervals has a power-law decay,which is common in scientific interests and human activities.We verify the distribution of communication-event intervals has really a power-law decay on the Tianhe-2 supercomputer,and also on the other six parallel systems with three different network topologies and two routing policies.In order to do a quantitative study on the power-law distribution,we exploit two groups of statistics:bursty vs.memory and periodicity vs.dispersion.Our results indicate that the communication events show a“strong-bursty and weak-memory”characteristic and the communication event intervals show the periodicity and the dispersion.Finally,our research provides an insight into the relationship between communication optimizations and time-dimension communication characteristics.