OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ...OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.展开更多
In this paper, we find the optimal precursors which can cause double-gyre regime transitions based on conditional nonlinear optimal perturbation (CNOP) method with Regional Ocean Modeling System (ROMS). Firstly, we si...In this paper, we find the optimal precursors which can cause double-gyre regime transitions based on conditional nonlinear optimal perturbation (CNOP) method with Regional Ocean Modeling System (ROMS). Firstly, we simulate the multiple-equilibria regimes of double-gyre circulation under different viscosity coefficient and obtain the bifurcation diagram, then choose two equilibrium states (called jet-up state and jet-down state) as reference states respectively, propose Principal Component Analysis-based Simulated Annealing (PCASA) algorithm to solve CNOP-type initial perturbations which can induce double-gyre regime transitions between jet-up state and jet-down state. PCASA algorithm is an adjoint-free method which searches optimal solution randomly in the whole solution space. In addition, we investigate CNOP-type initial perturbations how to evolve with time. The results show:(1) the CNOP-type perturbations present a two-cell structure, and gradually evolves into a three-cell structure at predictive time;(2) by superimposing CNOP-type perturbations on the jet-up state and integrating ROMS, double-gyre circulation transfers from jet-up state to jet-down state, and vice versa, and random initial perturbations don't cause the transitions, which means CNOP-type perturbations are the optimal precursors of double-gyre regime transitions;(3) by analyzing the transition process of double-gyre regime transitions, we find that CNOP-type initial perturbations obtain energy from the background state through both barotropic and baroclinic instabilities, and barotropic instability contributes more significantly to the fast-growth of the perturbations. The optimal precursors and the dynamic mechanism of double-gyre regime transitions revealed in this paper have an important significance to enhance the predictability of double-gyre circulation.展开更多
基金Project supported by the National Natural Science Foundation of China(No.61272145)the National High-Tech R&D Program(863)of China(No.2012AA012706)
文摘OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.
基金Supported by the National Natural Science Foundation of China(No.41405097)the Fundamental Research Funds for the Central Universities of China in 2017
文摘In this paper, we find the optimal precursors which can cause double-gyre regime transitions based on conditional nonlinear optimal perturbation (CNOP) method with Regional Ocean Modeling System (ROMS). Firstly, we simulate the multiple-equilibria regimes of double-gyre circulation under different viscosity coefficient and obtain the bifurcation diagram, then choose two equilibrium states (called jet-up state and jet-down state) as reference states respectively, propose Principal Component Analysis-based Simulated Annealing (PCASA) algorithm to solve CNOP-type initial perturbations which can induce double-gyre regime transitions between jet-up state and jet-down state. PCASA algorithm is an adjoint-free method which searches optimal solution randomly in the whole solution space. In addition, we investigate CNOP-type initial perturbations how to evolve with time. The results show:(1) the CNOP-type perturbations present a two-cell structure, and gradually evolves into a three-cell structure at predictive time;(2) by superimposing CNOP-type perturbations on the jet-up state and integrating ROMS, double-gyre circulation transfers from jet-up state to jet-down state, and vice versa, and random initial perturbations don't cause the transitions, which means CNOP-type perturbations are the optimal precursors of double-gyre regime transitions;(3) by analyzing the transition process of double-gyre regime transitions, we find that CNOP-type initial perturbations obtain energy from the background state through both barotropic and baroclinic instabilities, and barotropic instability contributes more significantly to the fast-growth of the perturbations. The optimal precursors and the dynamic mechanism of double-gyre regime transitions revealed in this paper have an important significance to enhance the predictability of double-gyre circulation.