HW/SW Co-optimization for Stencil Computation：Beginning with a Customizable Core

HW/SW Co-optimization for Stencil Computation:Beginning with a Customizable Core

导出

摘要 Energy efficiency is one of the most important issues for High Performance Computing（HPC） today.Heterogeneous HPC platform with some energy-efficient customizable cores（as application-specific accelerators）is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations—— the kernel of many high-performance applications. We develop a series of effective software/hardware co-optimization strategies to exploit the instruction-level and memory-computation parallelism,as well as to decrease the energy consumption. These optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data（SIMD）, and Direct Memory Access（DMA）, as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results are impressive： the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement. Energy efficiency is one of the most important issues for High Performance Computing（HPC） today.Heterogeneous HPC platform with some energy-efficient customizable cores（as application-specific accelerators）is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations—— the kernel of many high-performance applications. We develop a series of effective software/hardware co-optimization strategies to exploit the instruction-level and memory-computation parallelism,as well as to decrease the energy consumption. These optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data（SIMD）, and Direct Memory Access（DMA）, as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results are impressive： the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement.

作者 Yanhua Li Youhui Zhang Weiming Zheng

机构地区 Department of Computer Science

出处《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第5期570-580,共11页 清华大学学报（自然科学版（英文版）

基金 supported by the National HighTech Research and Development (863) Program of China (No. 2013AA01A215) the Brain Inspired Computing Research of Tsinghua University (No. 20141080934)

关键词 energy efficiency customizable processor stencil computation software and hardware co-optimization energy efficiency customizable processor stencil computation software and hardware co-optimization

分类号 TP38 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

1Introduction to Progress in Natural Science:Materials International[J].Progress in Natural Science:Materials International,2015,25(6).
2Gary Peters.Affirming Art： Heidegger and the Sense of a Beginning[J].Journal of Philosophy Study,2013,3(10):958-973.
3Jinzhong Yan.Beginning of a New Revolution of Science ＆ Technology[J].Journal of Physical Science and Application,2014,4(4):269-275. 被引量：5
4Introduction to Progress in Natural Science:Materials International[J].Progress in Natural Science:Materials International,2014,24(4):294-294.
5Introduction to Progress in Natural Science:Materials International[J].Progress in Natural Science:Materials International,2013,23(5):452-452.
6Introduction to Progress in Natural Science:Materials International[J].Progress in Natural Science:Materials International,2013,23(6):518-518.
7Introduction to Progress in Natural Science: Materials International[J].Progress in Natural Science:Materials International,2015,25(3):178-178.
8Introduction to Progress in Natural Science: Materials International[J].Progress in Natural Science:Materials International,2014,24(1):4-4.
9Introduction to Progress in Natural Science: Materials International[J].Progress in Natural Science:Materials International,2016,26(1).
10Introduction to Progress in Natural Science: Materials International[J].Progress in Natural Science:Materials International,2014,24(2):86-86.

Tsinghua Science and Technology

2016年第5期

浏览历史

内容加载中请稍等...

HW/SW Co-optimization for Stencil Computation：Beginning with a Customizable Core

相关作者

相关机构

相关主题

浏览历史