In this paper,the method of roofline model-guided compilation optimization parameter selection(RMOPS)is proposed based on Roofline model to maximize the performance of targets.Through the orthogonal test design compil...In this paper,the method of roofline model-guided compilation optimization parameter selection(RMOPS)is proposed based on Roofline model to maximize the performance of targets.Through the orthogonal test design compiler,the problem of optimization parameter selection in complex dependencies was solved.The performance data generated by empirical roofline tool(ERT)were used to implement the optimization parameter selection decision.RMOPS method was evaluated on ARMv8 platform,and the feasibility of RMOPS method was verified by using SPEC CPU2017 and NPB.Experimental results show that the program performance obtained by using the optimal optimization parameters of RMOPS search is generally improved by 5%–33%compared with that achieved by-O3 optimization parameter setting.展开更多
A design of a new heterogeneous code for LBM simulations is proposed.By heterogeneous computing wemean a collaborative computation on CPU and GPU,which is characterized by the following features:the data is distribute...A design of a new heterogeneous code for LBM simulations is proposed.By heterogeneous computing wemean a collaborative computation on CPU and GPU,which is characterized by the following features:the data is distributed between CPU and GPU memory spaces taking advantage of both parallel hierarchies;the capabilities of both SIMT GPU and SIMD GPU parallelization are used for calculations;the algorithms in use efficiently conceal the CPU-GPU data exchange;the subdivision of the computing task is performed with an account for the strong points of both processing units:high performance of GPU,low latency,and advanced memory hierarchy of CPU.This code is a continuation of our work in the development of LRnLA codes for LBM.Previous LRnLA codes had good efficiency both for CPU and GPU computing,and allowed GPU simulation performed on data stored in CPU RAM without performance loss on CPU-GPU data transfer.In the new code,we use methods and instruments that can be flexibly adapted to GPU and CPU instruction sets.We present the theoretical study of the performance of the proposed code and suggest implementation techniques.The bottlenecks are identified.As a result,we conclude that larger problems can be simulated with higher efficiency in the heterogeneous system.展开更多
基金National Key Research and Development Program of China(No.2017YFB0202003).
文摘In this paper,the method of roofline model-guided compilation optimization parameter selection(RMOPS)is proposed based on Roofline model to maximize the performance of targets.Through the orthogonal test design compiler,the problem of optimization parameter selection in complex dependencies was solved.The performance data generated by empirical roofline tool(ERT)were used to implement the optimization parameter selection decision.RMOPS method was evaluated on ARMv8 platform,and the feasibility of RMOPS method was verified by using SPEC CPU2017 and NPB.Experimental results show that the program performance obtained by using the optimal optimization parameters of RMOPS search is generally improved by 5%–33%compared with that achieved by-O3 optimization parameter setting.
基金supported by Russian Science Foundation,grant#18-71-10004.
文摘A design of a new heterogeneous code for LBM simulations is proposed.By heterogeneous computing wemean a collaborative computation on CPU and GPU,which is characterized by the following features:the data is distributed between CPU and GPU memory spaces taking advantage of both parallel hierarchies;the capabilities of both SIMT GPU and SIMD GPU parallelization are used for calculations;the algorithms in use efficiently conceal the CPU-GPU data exchange;the subdivision of the computing task is performed with an account for the strong points of both processing units:high performance of GPU,low latency,and advanced memory hierarchy of CPU.This code is a continuation of our work in the development of LRnLA codes for LBM.Previous LRnLA codes had good efficiency both for CPU and GPU computing,and allowed GPU simulation performed on data stored in CPU RAM without performance loss on CPU-GPU data transfer.In the new code,we use methods and instruments that can be flexibly adapted to GPU and CPU instruction sets.We present the theoretical study of the performance of the proposed code and suggest implementation techniques.The bottlenecks are identified.As a result,we conclude that larger problems can be simulated with higher efficiency in the heterogeneous system.