期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations
1
作者 Mei WEN Da-fei HUANG +1 位作者 Chang-qing XUN Dong CHEN 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2015年第11期899-916,共18页
OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When ... OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL's local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel's many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance. 展开更多
关键词 OpenCL Performance portability Multi-core/many-core CPU analysis-based transformation
原文传递
Optimal precursors of double-gyre regime transitions with an adjoint-free method 被引量:1
2
作者 YUAN Shijin LI Mi +3 位作者 WANG Qiang ZHANG Kun ZHANG Huazhen MU Bin 《Journal of Oceanology and Limnology》 SCIE CAS CSCD 2019年第4期1137-1153,共17页
In this paper, we find the optimal precursors which can cause double-gyre regime transitions based on conditional nonlinear optimal perturbation (CNOP) method with Regional Ocean Modeling System (ROMS). Firstly, we si... In this paper, we find the optimal precursors which can cause double-gyre regime transitions based on conditional nonlinear optimal perturbation (CNOP) method with Regional Ocean Modeling System (ROMS). Firstly, we simulate the multiple-equilibria regimes of double-gyre circulation under different viscosity coefficient and obtain the bifurcation diagram, then choose two equilibrium states (called jet-up state and jet-down state) as reference states respectively, propose Principal Component Analysis-based Simulated Annealing (PCASA) algorithm to solve CNOP-type initial perturbations which can induce double-gyre regime transitions between jet-up state and jet-down state. PCASA algorithm is an adjoint-free method which searches optimal solution randomly in the whole solution space. In addition, we investigate CNOP-type initial perturbations how to evolve with time. The results show:(1) the CNOP-type perturbations present a two-cell structure, and gradually evolves into a three-cell structure at predictive time;(2) by superimposing CNOP-type perturbations on the jet-up state and integrating ROMS, double-gyre circulation transfers from jet-up state to jet-down state, and vice versa, and random initial perturbations don't cause the transitions, which means CNOP-type perturbations are the optimal precursors of double-gyre regime transitions;(3) by analyzing the transition process of double-gyre regime transitions, we find that CNOP-type initial perturbations obtain energy from the background state through both barotropic and baroclinic instabilities, and barotropic instability contributes more significantly to the fast-growth of the perturbations. The optimal precursors and the dynamic mechanism of double-gyre regime transitions revealed in this paper have an important significance to enhance the predictability of double-gyre circulation. 展开更多
关键词 OPTIMAL precursors double-gyre regime transitions conditional nonlinear OPTIMAL perturbation (CNOP) Principal Component analysis-based Simulated Annealing (PCASA) multipleequilibria regimes
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部