This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining perform...Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.展开更多
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .展开更多
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig...Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.展开更多
This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is mani...This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases.展开更多
Considering the interaction between a sleeper,ballast layer,and substructure,a three-dimensional coupled discrete-finite element method for a ballasted railway track is proposed in this study.Ballast granules with irr...Considering the interaction between a sleeper,ballast layer,and substructure,a three-dimensional coupled discrete-finite element method for a ballasted railway track is proposed in this study.Ballast granules with irregular shapes are constructed using a clump model using the discrete element method.Meanwhile,concrete sleepers,embankments,and foundations are modelled using 20-node hexahedron solid elements using the finite element method.To improve computational efficiency,a GPU-based(Graphics Processing Unit)parallel framework is applied in the discrete element simulation.Additionally,an algorithm containing contact search and transfer parameters at the contact interface of discrete particles and finite elements is developed in the GPU parallel environment accordingly.A benchmark case is selected to verify the accuracy of the coupling algorithm.The dynamic response of the ballasted rail track is analysed under different train speeds and loads.Meanwhile,the dynamic stress on the substructure surface obtained by the established DEM-FEM model is compared with the in situ experimental results.Finally,stress and displacement contours in the cross-section of the model are constructed to further visualise the response of the ballasted railway.This proposed coupling model can provide important insights into high-performance coupling algorithms and the dynamic characteristics of full scale ballasted rail tracks.展开更多
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r...To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.展开更多
The primary goal of cloth simulation is to express object behavior in a realistic manner and achieve real-time performance by following the fundamental concept of physic.In general,the mass–spring system is applied t...The primary goal of cloth simulation is to express object behavior in a realistic manner and achieve real-time performance by following the fundamental concept of physic.In general,the mass–spring system is applied to real-time cloth simulation with three types of springs.However,hard spring cloth simulation using the mass–spring system requires a small integration time-step in order to use a large stiffness coefficient.Furthermore,to obtain stable behavior,constraint enforcement is used instead of maintenance of the force of each spring.Constraint force computation involves a large sparse linear solving operation.Due to the large computation,we implement a cloth simulation using adaptive constraint activation and deactivation techniques that involve the mass-spring system and constraint enforcement method to prevent excessive elongation of cloth.At the same time,when the length of the spring is stretched or compressed over a defined threshold,adaptive constraint activation and deactivation method deactivates the spring and generate the implicit constraint.Traditional method that uses a serial process of the Central Processing Unit(CPU)to solve the system in every frame cannot handle the complex structure of cloth model in real-time.Our simulation utilizes the Graphic Processing Unit(GPU)parallel processing with compute shader in OpenGL Shading Language(GLSL)to solve the system effectively.In this paper,we design and implement parallel method for cloth simulation,and experiment on the performance and behavior comparison of the mass-spring system,constraint enforcement,and adaptive constraint activation and deactivation techniques the using GPU-based parallel method.展开更多
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N...Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.展开更多
Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking...Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking comparing to Google Page Rank.The quantum Page Rank algorithm is essentially based on quantum stochastic walks and can be expressed using Lindblad master equation,which,however,needs to solve the Kronecker products of an O(N^(4))dimension and requires severely large memory and time when the number of nodes N in a network increases above 150.Here,we present an efficient solver for quantum Page Rank by using the Runge-Kutta method to reduce the matrix dimension to O(N^(2))and employing Tensor Flow to conduct GPU parallel computing.We demonstrate its performance in solving quantum stochastic walks on Erdos-Rényi graphs using an RTX 2060 GPU.The test on the graph of 6000 nodes requires a memory of 5.5 GB and time of 223 s,and that on the graph of 1000 nodes requires 226 MB and 3.6 s.Compared with QSWalk,a currently prevalent Mathematica solver,our solver for the same graph of 1000 nodes reduces the required memory and time to only 0.2%and 0.05%.We apply the solver to quantum Page Rank for the USA major airline network with up to 922 nodes,and to quantum stochastic walk on a glued tree of 2186 nodes.This efficient solver for large-scale quantum Page Rank and quantum stochastic walks would greatly facilitate studies of quantum information in real-life applications.展开更多
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
基金This work was supported by the National Natural Science Foundation of China(62073155,62002137,62106088,62206113)the High-End Foreign Expert Recruitment Plan(G2023144007L)the Fundamental Research Funds for the Central Universities(JUSRP221028).
文摘Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach.
文摘Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
基金financially supported by the National Natural Science Foundation of China (No.41174085)
文摘Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.
基金This work was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF-2019R1F1A1062752)funded by the Ministry of Education+1 种基金was funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.:5199990914048)and was also supported by the Soonchunhyang University Research Fund.
文摘This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases.
基金supported by the National Natural Science Foundation of China(Grant Nos.11872136,11802146,11772085)the Fundamental Research Funds for the Central Universities(Grant Nos.DUT19GJ206,DUT19ZD207).
文摘Considering the interaction between a sleeper,ballast layer,and substructure,a three-dimensional coupled discrete-finite element method for a ballasted railway track is proposed in this study.Ballast granules with irregular shapes are constructed using a clump model using the discrete element method.Meanwhile,concrete sleepers,embankments,and foundations are modelled using 20-node hexahedron solid elements using the finite element method.To improve computational efficiency,a GPU-based(Graphics Processing Unit)parallel framework is applied in the discrete element simulation.Additionally,an algorithm containing contact search and transfer parameters at the contact interface of discrete particles and finite elements is developed in the GPU parallel environment accordingly.A benchmark case is selected to verify the accuracy of the coupling algorithm.The dynamic response of the ballasted rail track is analysed under different train speeds and loads.Meanwhile,the dynamic stress on the substructure surface obtained by the established DEM-FEM model is compared with the in situ experimental results.Finally,stress and displacement contours in the cross-section of the model are constructed to further visualise the response of the ballasted railway.This proposed coupling model can provide important insights into high-performance coupling algorithms and the dynamic characteristics of full scale ballasted rail tracks.
基金supported by the National Magnetic Confinement Fusion Research Program of China(Grant No.2014GB103000)the National Natural Science Foundation of China(Grant No.11575245)the National Natural Science Foundation of China for Youth(Grant No.11205191)
文摘To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF-2019R1F1A1062752)funded by the Ministry of Education+1 种基金funded by BK21 FOUR(Fostering Outstanding Universities for Research)(No.:5199990914048)supported by the Soonchunhyang University Research Fund.
文摘The primary goal of cloth simulation is to express object behavior in a realistic manner and achieve real-time performance by following the fundamental concept of physic.In general,the mass–spring system is applied to real-time cloth simulation with three types of springs.However,hard spring cloth simulation using the mass–spring system requires a small integration time-step in order to use a large stiffness coefficient.Furthermore,to obtain stable behavior,constraint enforcement is used instead of maintenance of the force of each spring.Constraint force computation involves a large sparse linear solving operation.Due to the large computation,we implement a cloth simulation using adaptive constraint activation and deactivation techniques that involve the mass-spring system and constraint enforcement method to prevent excessive elongation of cloth.At the same time,when the length of the spring is stretched or compressed over a defined threshold,adaptive constraint activation and deactivation method deactivates the spring and generate the implicit constraint.Traditional method that uses a serial process of the Central Processing Unit(CPU)to solve the system in every frame cannot handle the complex structure of cloth model in real-time.Our simulation utilizes the Graphic Processing Unit(GPU)parallel processing with compute shader in OpenGL Shading Language(GLSL)to solve the system effectively.In this paper,we design and implement parallel method for cloth simulation,and experiment on the performance and behavior comparison of the mass-spring system,constraint enforcement,and adaptive constraint activation and deactivation techniques the using GPU-based parallel method.
基金supported by the National Natural Science Foundation of China (No.11172134)the Funding of Jiangsu Innovation Program for Graduate Education (No.CXLX13_132)
文摘Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially.
基金supported by the National Key R&D Program of China(2019YFA0308700,and 2017YFA0303700)the National Natural Science Foundation of China(61734005,11761141014,11690033)+3 种基金the Science and Technology Commission of Shanghai Municipality(STCSM)(17JC1400403)the Shanghai Municipal Education Commission(SMEC)(2019SHZDZX01,2017-01-07-0002-E00049)supported by the National Natural Science Foundation of China(11904229)China Postdoctoral Science Foundation(2019T120334)。
文摘Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking comparing to Google Page Rank.The quantum Page Rank algorithm is essentially based on quantum stochastic walks and can be expressed using Lindblad master equation,which,however,needs to solve the Kronecker products of an O(N^(4))dimension and requires severely large memory and time when the number of nodes N in a network increases above 150.Here,we present an efficient solver for quantum Page Rank by using the Runge-Kutta method to reduce the matrix dimension to O(N^(2))and employing Tensor Flow to conduct GPU parallel computing.We demonstrate its performance in solving quantum stochastic walks on Erdos-Rényi graphs using an RTX 2060 GPU.The test on the graph of 6000 nodes requires a memory of 5.5 GB and time of 223 s,and that on the graph of 1000 nodes requires 226 MB and 3.6 s.Compared with QSWalk,a currently prevalent Mathematica solver,our solver for the same graph of 1000 nodes reduces the required memory and time to only 0.2%and 0.05%.We apply the solver to quantum Page Rank for the USA major airline network with up to 922 nodes,and to quantum stochastic walk on a glued tree of 2186 nodes.This efficient solver for large-scale quantum Page Rank and quantum stochastic walks would greatly facilitate studies of quantum information in real-life applications.