期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
A typhoon-induced storm surge numerical model with GPU acceleration based on an unstructured spherical centroidal Voronoi tessellation grid
1
作者 Yuanyong Gao Fujiang Yu +2 位作者 Cifu Fu Jianxi Dong Qiuxing Liu 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2024年第3期40-47,共8页
Storm surge is often the marine disaster that poses the greatest threat to life and property in coastal areas.Accurate and timely issuance of storm surge warnings to take appropriate countermeasures is an important me... Storm surge is often the marine disaster that poses the greatest threat to life and property in coastal areas.Accurate and timely issuance of storm surge warnings to take appropriate countermeasures is an important means to reduce storm surge-related losses.Storm surge numerical models are important for storm surge forecasting.To further improve the performance of the storm surge forecast models,we developed a numerical storm surge forecast model based on an unstructured spherical centroidal Voronoi tessellation(SCVT)grid.The model is based on shallow water equations in vector-invariant form,and is discretized by Arakawa C grid.The SCVT grid can not only better describe the coastline information but also avoid rigid transitions,and it has a better global consistency by generating high-resolution grids in the key areas through transition refinement.In addition,the simulation speed of the model is accelerated by using the openACC-based GPU acceleration technology to meet the timeliness requirements of operational ensemble forecast.It only takes 37 s to simulate a day in the coastal waters of China.The newly developed storm surge model was applied to simulate typhoon-induced storm surges in the coastal waters of China.The hindcast experiments on the selected representative typhoon-induced storm surge processes indicate that the model can reasonably simulate the distribution characteristics of storm surges.The simulated maximum storm surges and their occurrence times are consistent with the observed data at the representative tide gauge stations,and the mean absolute errors are 3.5 cm and 0.6 h respectively,showing high accuracy and application prospects. 展开更多
关键词 typhoon-induced storm surge numerical model gpu acceleration unstructured grid spherical centroidal Voronoi tessellation(SCVT)
下载PDF
GPU-based cross-platform Monte Carlo proton dose calculation engine in the framework of Taichi 被引量:1
2
作者 Wei-Guang Li Cheng Chang +4 位作者 Yao Qin Zi-Lu Wang Kai-Wen Li Li-Sheng Geng Hao Wu 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2023年第5期152-162,共11页
In recent years,graphics processing units(GPUs)have been applied to accelerate Monte Carlo(MC)simulations for proton dose calculation in radiotherapy.Nonetheless,current GPU platforms,such as Compute Unified Device Ar... In recent years,graphics processing units(GPUs)have been applied to accelerate Monte Carlo(MC)simulations for proton dose calculation in radiotherapy.Nonetheless,current GPU platforms,such as Compute Unified Device Architecture(CUDA)and Open Computing Language(OpenCL),suffer from cross-platform limitation or relatively high programming barrier.However,the Taichi toolkit,which was developed to overcome these difficulties,has been successfully applied to high-performance numerical computations.Based on the class II condensed history simulation scheme with various proton-nucleus interactions,we developed a GPU-accelerated MC engine for proton transport using the Taichi toolkit.Dose distributions in homogeneous and heterogeneous geometries were calculated for 110,160,and 200 MeV protons and were compared with those obtained by full MC simulations using TOPAS.The gamma passing rates were greater than 0.99 and 0.95 with criteria of 2 mm,2%and 1 mm,1%,respectively,in all the benchmark tests.Moreover,the calculation speed was at least 5800 times faster than that of TOPAS,and the number of lines of code was approximately 10 times less than those of CUDA or OpenCL.Our study provides a highly accurate,efficient,and easy-to-use proton dose calculation engine for fast prototyping,beamlet calculation,and education purposes. 展开更多
关键词 Proton therapy Monte Carlo dose calculation gpu acceleration Taichi
下载PDF
An incompressible flow solver on a GPU/CPU heterogeneous architecture parallel computing platform
3
作者 Qianqian Li Rong Li Zixuan Yang 《Theoretical & Applied Mechanics Letters》 CSCD 2023年第5期387-393,共7页
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t... A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature. 展开更多
关键词 gpu Acceleration Parallel computing Poisson equation PRECONDITIONER
下载PDF
GAM:A GPU-Accelerated Algorithm for MaxRS Queries in Road Networks
4
作者 陈剑 张开旗 +2 位作者 任甜 武震卿 高宏 《Journal of Computer Science & Technology》 SCIE EI CSCD 2022年第5期1005-1025,共21页
In smart phones,vehicles and wearable devices,GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world.Given a set of weighted points and a rectangle r in the space,a maximizing range ... In smart phones,vehicles and wearable devices,GPS sensors are ubiquitous and collect a lot of valuable spatial data from the real world.Given a set of weighted points and a rectangle r in the space,a maximizing range sum(MaxRS)query is to find the position of r,so as to maximize the total weight of the points covered by r(i.e.,the range sum).It has a wide spectrum of applications in spatial crowdsourcing,facility location and traffic monitoring.Most of the existing research focuses on the Euclidean space;however,in real life,the user’s moving route is constrained by the road network,and the existing MaxRS query algorithms in the road network are inefficient.In this paper,we propose a novel GPU-accelerated algorithm,namely,GAM,to tackle MaxRS queries in road networks in two phases efficiently.In phase 1,we partition the entire road network into many small cells by a grid and theoretically prove the correctness of parallel query results by grid shifting,and then we propose an effective multi-grained pruning technique,by which the majority of cells can be pruned without further checking.In phase 2,we design a GPU-friendly storage structure,cell-based road network(CRN),and a two-level parallel framework to compute the final result in the remaining cells.Finally,we conduct extensive experiments on two real-world road networks,and the experimental results demonstrate that GAM is on average one order faster than state-of-the-art competitors,and the maximum speedup can achieve about 55 times. 展开更多
关键词 road network maximizing range sum gpu acceleration pruning strategy
原文传递
An Improved Graphics Processing Unit Acceleration Approach for Three-Dimensional Structural Topology Optimization Using the Element-Free Galerkin Method
5
作者 Haishan Lu Shuguang Gong +2 位作者 Jianping Zhang Guilan Xie Shuohui Yin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第9期1151-1178,共28页
We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the ra... We proposed an improved graphics processing unit(GPU)acceleration approach for three-dimensional structural topology optimization using the element-free Galerkin(EFG)method.This method can effectively eliminate the race condition under parallelization.We established a structural topology optimization model by combining the EFG method and the solid isotropic microstructures with penalization model.We explored the GPU parallel algorithm of assembling stiffness matrix,solving discrete equation,analyzing sensitivity,and updating design variables in detail.We also proposed a node pair-wise method for assembling the stiffnessmatrix and a node-wise method for sensitivity analysis to eliminate race conditions during the parallelization.Furthermore,we investigated the effects of the thread block size,the number of degrees of freedom,and the convergence error of preconditioned conjugate gradient(PCG)on GPU computing performance.Finally,the results of the three numerical examples demonstrated the validity of the proposed approach and showed the significant acceleration of structural topology optimization.To save the cost of optimization calculation,we proposed the appropriate thread block size and the convergence error of the PCG method. 展开更多
关键词 Topology optimization EFG method gpu acceleration race condition preconditioned conjugate gradient
下载PDF
Real-time accurate Free-Form Deformation in terms of triangular Bézier surfaces
6
作者 CUI Yuan-min FENG Jie-qing 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2014年第4期455-467,共13页
We implemented accurate FFD in terms of triangular Bezier surfaces as matrix multiplications in CUDA and rendered them via OpenGL. Experimental results show that the proposed algorithm is more efficient than the previ... We implemented accurate FFD in terms of triangular Bezier surfaces as matrix multiplications in CUDA and rendered them via OpenGL. Experimental results show that the proposed algorithm is more efficient than the previous GPU acceleration algorithm and tessel- lation shader algorithms. 展开更多
关键词 accurate Free-Form Deformation gpu acceleration CUDA triangular B@zier surface.
下载PDF
Parallelization and Acceleration of Dynamic Option Pricing Models on GPU-CPU Heterogeneous Systems
7
作者 Brian Wesley MUGANDA Bernard Shibwabo KASAMANI 《Journal of Systems Science and Information》 CSCD 2023年第5期622-635,共14页
In this paper,stochastic global optimization algorithms,specifically,genetic algorithm and simulated annealing are used for the problem of calibrating the dynamic option pricing model under stochastic volatility to ma... In this paper,stochastic global optimization algorithms,specifically,genetic algorithm and simulated annealing are used for the problem of calibrating the dynamic option pricing model under stochastic volatility to market prices by adopting a hybrid programming approach.The performance of this dynamic option pricing model under the obtained optimal parameters is also discussed.To enhance the model throughput and reduce latency,a heterogeneous hybrid programming approach on GPU was adopted which emphasized a data-parallel implementation of the dynamic option pricing model on a GPU-based system.Kernel offloading to the GPU of the compute-intensive segments of the pricing algorithms was done in OpenCL.The GPU approach was found to significantly reduce latency by an optimum of 541 times faster than a parallel implementation approach on the CPU,reducing the computation time from 46.24 minutes to 5.12 seconds. 展开更多
关键词 PARALLELIZATION gpu computing option pricing gpu acceleration stochastic volatility hybrid programming
原文传递
A non-uniform grid approach for high-resolution flood inundation simulation based on GPUs 被引量:1
8
作者 Jun-hui Wang Jing-ming Hou +5 位作者 Jia-hui Gong Bing-yao Li Bao-shan Shi Min-peng Guo Jian Shen Peng Lu 《Journal of Hydrodynamics》 SCIE EI CSCD 2021年第4期844-860,共17页
In view of the frequent occurrence of floods due to climate change, and the fact that a large calculation domain, with complex land types, is required for solving the problem of the flood simulations, this paper propo... In view of the frequent occurrence of floods due to climate change, and the fact that a large calculation domain, with complex land types, is required for solving the problem of the flood simulations, this paper proposes an optimized non-uniform grid model combined with a high-resolution model based on the graphics processing unit (GPU) acceleration to simulate the surface water flow process. For the grid division, the topographic gradient change is taken as the control variable and different optimization criteria are designed according to different land types. In the numerical model, the Godunov-type method is adopted for the spatial discretization, the TVD-MUSUL and Runge-Kutta methods are used to improve the model’s spatial and temporal calculation accuracies, and the simulation time is reduced by leveraging the GPU acceleration. The model is applied to ideal and actual case studies. The results show that the numerical model based on a non-uniform grid enjoys a good stability. In the simulation of the urban inundation, approximately 40%–50% of the urban average topographic gradient change to be covered is taken as the threshold for the non-uniform grid division, and the calculation efficiency and accuracy can be optimized. In this case, the calculation efficiency of the non-uniform grid based on the optimized parameters is 2–3 times of that of the uniform grid, and the approach can be adopted for the actual flood simulation in large-scale areas. 展开更多
关键词 Non-uniform grid high-resolution model Godunov-type flood simulation graphics processing unit(gpu)acceleration
原文传递
Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection
9
作者 Fatemeh Azmandian Member, IEEE, Ayse Yilmazer +5 位作者 Student Member, IEEE, Jennifer G. Dy Member, IEEE Javed A. Aslam IEEE, Jennifer G. Dy Member, ACM David R. Kaeli Fellow, IEEE, Member, ACM 《Journal of Computer Science & Technology》 SCIE EI CSCD 2014年第3期408-422,共15页
Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation... Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points. 展开更多
关键词 feature selection outlier detection imbalanced data gpu acceleration
原文传递
Accelerating the cryo-EM structure determination in RELION on GPU cluster
10
作者 Xin YOU Hailong YANG +1 位作者 Zhongzhi LUAN Depei QIAN 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第3期21-39,共19页
The cryo-electron microscopy(cryo-EM)is one of the most powerful technologies available today for structural biology.The RELION(Regularized Likelihood Optimization)implements a Bayesian algorithm for cryo-EM structure... The cryo-electron microscopy(cryo-EM)is one of the most powerful technologies available today for structural biology.The RELION(Regularized Likelihood Optimization)implements a Bayesian algorithm for cryo-EM structure determination,which is one of the most widely used software in this field.Many researchers have devoted effort to improve the performance of RELION to satisfy the analysis for the ever-increasing volume of datasets.In this paper,we focus on performance analysis of the most time-consuming computation steps in RELION and identify their performance bottlenecks for specific optimizations.We propose several performance optimization strategies to improve the overall performance of RELION,including optimization of expectation step,parallelization of maximization step,accelerating the computation of symmetries,and memory affinity optimization.The experiment results show that our proposed optimizations achieve significant speedups of RELION across representative datasets.In addition,we perform roofline model analysis to understand the effectiveness of our optimizations. 展开更多
关键词 cryo-EM structure determination performance optimization gpu acceleration RELION
原文传递
GPU based real-time simulation of massive falling leaves
11
作者 Chengyang Li Jingye Qian +2 位作者 Ruofeng Tong Jian Chang Jianjun Zhang 《Computational Visual Media》 2015年第4期351-358,共8页
As an important autumn feature,scenes with large numbers of falling leaves are common in movies and games. However,it is a challenge for computer graphics to simulate such scenes in an authentic and efficient manner. ... As an important autumn feature,scenes with large numbers of falling leaves are common in movies and games. However,it is a challenge for computer graphics to simulate such scenes in an authentic and efficient manner. This paper proposes a GPU based approach for simulating the falling motion of many leaves in real time. Firstly,we use a motionsynthesis based method to analyze the falling motion of the leaves,which enables us to describe complex falling trajectories using low-dimensional features. Secondly,we transmit a primitive-motion trajectory dataset together with the low-dimensional features of the falling leaves to video memory,allowing us to execute the appropriate calculations on the GPU. 展开更多
关键词 real-time simulation falling leaves gpu acceleration
原文传递
Harmonic coordinates for real-time image cloning 被引量:1
12
作者 Rui WANG Wei-feng CHEN Ming-hao PAN Hu-jun BAO 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第9期690-698,共9页
Traditional gradient domain seamless image cloning is a time consuming task,requiring the solving of Poisson's equations whenever the shape or position of the cloned region changes.Recently,a more efficient altern... Traditional gradient domain seamless image cloning is a time consuming task,requiring the solving of Poisson's equations whenever the shape or position of the cloned region changes.Recently,a more efficient alternative,the mean-value coordinates(MVCs) based approach,was proposed to interpolate interior pixels by a weighted combination of values along the boundary.However,this approach cannot faithfully preserve the gradient in the cloning region.In this paper,we introduce harmonic cloning,which uses harmonic coordinates(HCs) instead of MVCs in image cloning.Benefiting from the non-negativity and interior locality of HCs,our interpolation generates a more accurate harmonic field across the cloned region,to preserve the results with as high a quality as with Poisson cloning.Furthermore,with optimizations and implementation on a graphic processing unit(GPU),we demonstrate that,compared with the method using MVCs,our harmonic cloning gains better quality while retaining real-time performance. 展开更多
关键词 Seamless cloning Poisson’s equation Harmonic coordinates(HCs) Mean-value coordinates(MVCs) gpu acceleration
原文传递
A graphics processing unit-based robust numerical model for solute transport driven by torrential flow condition 被引量:1
13
作者 Jing-ming HOU Bao-shan SHI +6 位作者 Qiu-hua LIANG Yu TONG Yong-de KANG Zhao-an ZHANG Gang-gang BAI Xu-jun GAO Xiao YANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2021年第10期835-850,共16页
Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shal... Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shallow water equations and transport equations.The model adopts the Harten-Lax-van Leer-contact(HLLC)-approximate Riemann solution to calculate the cell interface fluxes.It can deal well with the changes in the dry and wet interfaces in an actual complex terrain,and it has a strong shock-wave capturing ability.Using monotonic upstream-centred scheme for conservation laws(MUSCL)linear reconstruction with finite slope and the Runge-Kutta time integration method can achieve second-order accuracy.At the same time,the introduction of graphics processing unit(GPU)-accelerated computing technology greatly increases the computing speed.The model is validated against multiple benchmarks,and the results are in good agreement with analytical solutions and other published numerical predictions.The third test case uses the GPU and central processing unit(CPU)calculation models which take 3.865 s and 13.865 s,respectively,indicating that the GPU calculation model can increase the calculation speed by 3.6 times.In the fourth test case,comparing the numerical model calculated by GPU with the traditional numerical model calculated by CPU,the calculation efficiencies of the numerical model calculated by GPU under different resolution grids are 9.8–44.6 times higher than those by CPU.Therefore,it has better potential than previous models for large-scale simulation of solute transport in water pollution incidents.It can provide a reliable theoretical basis and strong data support in the rapid assessment and early warning of water pollution accidents. 展开更多
关键词 Solute transport Shallow water equations Godunov-type scheme Harten-Lax-van Leer-contact(HLLC)Riemann solver Graphics processing unit(gpu)acceleration technology Torrential flow
原文传递
A TensorFIow-based new high-performance computational framework for CFD
14
作者 Xi-zeng Zhao Tian-yu Xu +1 位作者 Zhou-teng Ye Wei-jie Liu 《Journal of Hydrodynamics》 SCIE EI CSCD 2020年第4期735-746,共12页
In this study,a computational framework in the field of artificial intelligence was applied in computational fluid dynamics(CFD)field.This Framework,which was initially proposed by Google Al department,is called"... In this study,a computational framework in the field of artificial intelligence was applied in computational fluid dynamics(CFD)field.This Framework,which was initially proposed by Google Al department,is called"TensorFlow".An improved CFD model based on this framework was developed with a high-order difference method,which is a constrained interpolation profile(CIP)scheme for the base flow solver of the advection term in the Navier-Stokes equations,and preconditioned conjugate gradient(PCG)method was implemented in the model to solve the Poisson equation.Some new features including the convolution,vectorization,and graphics processing unit(GPU)acceleration were implemented to raise the computational efficiency.The model was tested with several benchmark cases and shows good performance.Compared with our former CIP-based model,the present Tensor Flow-based model also shows significantly higher computational efficiency in large-scale computation.The results indicate TensorFlow could be a promising framework for CFD models due to its ability in the computational acceleration and convenience for programming. 展开更多
关键词 TensorFlow VECTORIZATION Navier-Stokes equations graphics processing unit(gpu)acceleration constrained interpolation profile(CIP)method preconditioned conjugate gradient(PCG)method
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部