Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is mani...This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases.展开更多
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r...To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
基金This work was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF-2019R1F1A1062752)funded by the Ministry of Education+1 种基金was funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.:5199990914048)and was also supported by the Soonchunhyang University Research Fund.
文摘This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases.
基金supported by the National Magnetic Confinement Fusion Research Program of China(Grant No.2014GB103000)the National Natural Science Foundation of China(Grant No.11575245)the National Natural Science Foundation of China for Youth(Grant No.11205191)
文摘To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.