Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far ...Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability.In this paper,we present a preliminary design of a performance-portable unified programming model including four aspects:programming language,programming abstraction,compilation optimization,and scheduling system.Specifically,domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures.The unified programming abstraction unifies the common features of different architectures to support common optimizations.Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations.Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers.This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.展开更多
Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a dire...Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive- based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards oifloading computations to accelerators (typically one), OpenMC alms to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.展开更多
On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design...On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.展开更多
基金partially supported by the National Natural Science Foundation of China under Grant No.62225206.
文摘Unified programming models can effectively improve program portability on various heterogeneous high-performance computers.Existing unified programming models put a lot of effort to code portability but are still far from achieving good performance portability.In this paper,we present a preliminary design of a performance-portable unified programming model including four aspects:programming language,programming abstraction,compilation optimization,and scheduling system.Specifically,domain-specific languages introduce domain knowledge to decouple the optimizations for different applications and architectures.The unified programming abstraction unifies the common features of different architectures to support common optimizations.Multi-level compilation optimization enables comprehensive performance optimization based on multi-level intermediate representations.Resource-aware lightweight runtime scheduling system improves the resource utilization of heterogeneous computers.This is a perspective paper to show our viewpoints on programming models for emerging heterogeneous systems.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA01A301the National Natural Science Foundation of China under Grant No.61170049
文摘Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive- based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards oifloading computations to accelerators (typically one), OpenMC alms to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.
基金Acknowledgements This work was partially supported by the Na- tional High-tech R&D Program of China (863 Program) (2012AA01A301), and the National Natural Science Foundation of China (Grant No. 61120106005). The MilkyWay-2 project is a great team effort and benefits from the cooperation of many individuals at NUDT. We thank all the people who have contributed to the system in a variety of ways.
文摘On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.