期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
1
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/gpu parallel computing hybrid OpenMPCUDA
下载PDF
PHUI-GA: GPU-based efficiency evolutionary algorithm for mining high utility itemsets
2
作者 JIANG Haipeng WU Guoqing +3 位作者 SUN Mengdan LI Feng SUN Yunfei FANG Wei 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期965-975,共11页
Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining perform... Evolutionary algorithms(EAs)have been used in high utility itemset mining(HUIM)to address the problem of discover-ing high utility itemsets(HUIs)in the exponential search space.EAs have good running and mining performance,but they still require huge computational resource and may miss many HUIs.Due to the good combination of EA and graphics processing unit(GPU),we propose a parallel genetic algorithm(GA)based on the platform of GPU for mining HUIM(PHUI-GA).The evolution steps with improvements are performed in central processing unit(CPU)and the CPU intensive steps are sent to GPU to eva-luate with multi-threaded processors.Experiments show that the mining performance of PHUI-GA outperforms the existing EAs.When mining 90%HUIs,the PHUI-GA is up to 188 times better than the existing EAs and up to 36 times better than the CPU parallel approach. 展开更多
关键词 high utility itemset mining(HUIM) graphics process-ing unit(gpu)parallel genetic algorithm(GA) mining perfor-mance
下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
3
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework gpu Parallel Computing Particle Swarm Optimization
下载PDF
The inversion of density structure by graphic processing unit(GPU) and identification of igneous rocks in Xisha area 被引量:1
4
作者 Lei Yu Jian Zhang +2 位作者 Wei Lin Rongqiang Wei Shiguo Wu 《Earthquake Science》 2014年第1期117-125,共9页
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig... Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration. 展开更多
关键词 Xisha area Organic reefs and igneous rocks -Frequency decomposition of potential field 3D inversionof the graphic processing unit gpu parallel processing
下载PDF
Real-time Volume Preserving Constraints for Volumetric Model on GPU
5
作者 Hongly Va Min-Hyung Choi Min Hong 《Computers, Materials & Continua》 SCIE EI 2022年第10期831-848,共18页
This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is mani... This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases. 展开更多
关键词 Deformable object simulation mass-spring system implicit constraint enforcement volume conservation constraint gpu parallel computing
下载PDF
GPU-Based Simulation of Dynamic Characteristics of Ballasted Railway Track with Coupled Discrete-Finite Element Method
6
作者 Xu Li YingYan +1 位作者 Shuai Shao Shunying Ji 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第2期645-671,共27页
Considering the interaction between a sleeper,ballast layer,and substructure,a three-dimensional coupled discrete-finite element method for a ballasted railway track is proposed in this study.Ballast granules with irr... Considering the interaction between a sleeper,ballast layer,and substructure,a three-dimensional coupled discrete-finite element method for a ballasted railway track is proposed in this study.Ballast granules with irregular shapes are constructed using a clump model using the discrete element method.Meanwhile,concrete sleepers,embankments,and foundations are modelled using 20-node hexahedron solid elements using the finite element method.To improve computational efficiency,a GPU-based(Graphics Processing Unit)parallel framework is applied in the discrete element simulation.Additionally,an algorithm containing contact search and transfer parameters at the contact interface of discrete particles and finite elements is developed in the GPU parallel environment accordingly.A benchmark case is selected to verify the accuracy of the coupling algorithm.The dynamic response of the ballasted rail track is analysed under different train speeds and loads.Meanwhile,the dynamic stress on the substructure surface obtained by the established DEM-FEM model is compared with the in situ experimental results.Finally,stress and displacement contours in the cross-section of the model are constructed to further visualise the response of the ballasted railway.This proposed coupling model can provide important insights into high-performance coupling algorithms and the dynamic characteristics of full scale ballasted rail tracks. 展开更多
关键词 Ballasted track coupled discrete element-finite element method gpu parallel algorithm dynamic characteristics
下载PDF
Fast parallel Grad–Shafranov solver for real-time equilibrium reconstruction in EAST tokamak using graphic processing unit 被引量:1
7
作者 黄耀 肖炳甲 罗正平 《Chinese Physics B》 SCIE EI CAS CSCD 2017年第8期276-283,共8页
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r... To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes. 展开更多
关键词 TOKAMAK Grad-Shafranov equation equilibrium reconstruction gpu parallel computation
下载PDF
Parallel Cloth Simulation Using OpenGL Shading Language 被引量:1
8
作者 Hongly Va Min-Hyung Choi Min Hong 《Computer Systems Science & Engineering》 SCIE EI 2022年第5期427-443,共17页
The primary goal of cloth simulation is to express object behavior in a realistic manner and achieve real-time performance by following the fundamental concept of physic.In general,the mass–spring system is applied t... The primary goal of cloth simulation is to express object behavior in a realistic manner and achieve real-time performance by following the fundamental concept of physic.In general,the mass–spring system is applied to real-time cloth simulation with three types of springs.However,hard spring cloth simulation using the mass–spring system requires a small integration time-step in order to use a large stiffness coefficient.Furthermore,to obtain stable behavior,constraint enforcement is used instead of maintenance of the force of each spring.Constraint force computation involves a large sparse linear solving operation.Due to the large computation,we implement a cloth simulation using adaptive constraint activation and deactivation techniques that involve the mass-spring system and constraint enforcement method to prevent excessive elongation of cloth.At the same time,when the length of the spring is stretched or compressed over a defined threshold,adaptive constraint activation and deactivation method deactivates the spring and generate the implicit constraint.Traditional method that uses a serial process of the Central Processing Unit(CPU)to solve the system in every frame cannot handle the complex structure of cloth model in real-time.Our simulation utilizes the Graphic Processing Unit(GPU)parallel processing with compute shader in OpenGL Shading Language(GLSL)to solve the system effectively.In this paper,we design and implement parallel method for cloth simulation,and experiment on the performance and behavior comparison of the mass-spring system,constraint enforcement,and adaptive constraint activation and deactivation techniques the using GPU-based parallel method. 展开更多
关键词 Adaptive constraint cloth simulation constraint enforcement GLSL compute shader mass–spring system parallel gpu
下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
9
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
TensorFlow solver for quantum Page Rank in large-scale networks 被引量:1
10
作者 Hao Tang Ruoxi Shi +4 位作者 Tian-Shen He Yan-Yan Zhu Tian-Yu Wang Marcus Lee Xian-Min Jin 《Science Bulletin》 SCIE EI CSCD 2021年第2期120-126,M0003,共8页
Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking... Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking comparing to Google Page Rank.The quantum Page Rank algorithm is essentially based on quantum stochastic walks and can be expressed using Lindblad master equation,which,however,needs to solve the Kronecker products of an O(N^(4))dimension and requires severely large memory and time when the number of nodes N in a network increases above 150.Here,we present an efficient solver for quantum Page Rank by using the Runge-Kutta method to reduce the matrix dimension to O(N^(2))and employing Tensor Flow to conduct GPU parallel computing.We demonstrate its performance in solving quantum stochastic walks on Erdos-Rényi graphs using an RTX 2060 GPU.The test on the graph of 6000 nodes requires a memory of 5.5 GB and time of 223 s,and that on the graph of 1000 nodes requires 226 MB and 3.6 s.Compared with QSWalk,a currently prevalent Mathematica solver,our solver for the same graph of 1000 nodes reduces the required memory and time to only 0.2%and 0.05%.We apply the solver to quantum Page Rank for the USA major airline network with up to 922 nodes,and to quantum stochastic walk on a glued tree of 2186 nodes.This efficient solver for large-scale quantum Page Rank and quantum stochastic walks would greatly facilitate studies of quantum information in real-life applications. 展开更多
关键词 Quantum stochastic walk Quantum PageRank Lindblad master equation TensorFlow gpu parallel computing Runge-Kutta method
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部