期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
1
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/gpu parallel computing hybrid OpenMPCUDA
下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
2
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework gpu Parallel Computing Particle Swarm Optimization
下载PDF
Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems 被引量:2
3
作者 杨灿群 吴强 +3 位作者 胡慧俐 石志才 陈娟 唐滔 《Journal of Central South University》 SCIE EI CAS 2013年第6期1527-1535,共9页
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic pro... Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU. 展开更多
关键词 gpu computing heterogeneous computing plasma physics simulations particle-in-cell (PIC)
下载PDF
Real-time Volume Preserving Constraints for Volumetric Model on GPU
4
作者 Hongly Va Min-Hyung Choi Min Hong 《Computers, Materials & Continua》 SCIE EI 2022年第10期831-848,共18页
This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is mani... This paper presents a parallel method for simulating real-time 3D deformable objects using the volume preservation mass-spring system method on tetrahedron meshes.In general,the conventional mass-spring system is manipulated as a force-driven method because it is fast,simple to implement,and the parameters can be controlled.However,the springs in traditional mass-spring system can be excessively elongated which cause severe stability and robustness issues that lead to shape restoring,simulation blow-up,and huge volume loss of the deformable object.In addition,traditional method that uses a serial process of the central processing unit(CPU)to solve the system in every frame cannot handle the complex structure of deformable object in real-time.Therefore,the first order implicit constraint enforcement for a mass-spring model is utilized to achieve accurate visual realism of deformable objects with tough constraint error.In this paper,we applied the distance constraint and volume conservation constraints for each tetrahedron element to improve the stability of deformable object simulation using the mass-spring system and behave the same as its real-world counterparts.To reduce the computational complexity while ensuring stable simulation,we applied a method that utilizes OpenGL compute shader,a part of OpenGL Shading Language(GLSL)that executes on the graphic processing unit(GPU)to solve the numerical problems effectively.We applied the proposed methods to experimental volumetric models,and volume percentages of all objects are compared.The average volume percentages of all models during the simulation using the mass-spring system,distance constraint,and the volume constraint method were 68.21%,89.64%,and 98.70%,respectively.The proposed approaches are successfully applied to improve the stability of mass-spring system and the performance comparison from our experimental tests also shows that the GPU-based method is faster than CPU-based implementation for all cases. 展开更多
关键词 Deformable object simulation mass-spring system implicit constraint enforcement volume conservation constraint gpu parallel computing
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
5
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 heterogeneous parallel system programming framework scientific computing gpu computing molecular dynamic
下载PDF
Fast parallel Grad–Shafranov solver for real-time equilibrium reconstruction in EAST tokamak using graphic processing unit 被引量:1
6
作者 黄耀 肖炳甲 罗正平 《Chinese Physics B》 SCIE EI CAS CSCD 2017年第8期276-283,共8页
To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally r... To achieve real-time control of tokamak plasmas, the equilibrium reconstruction has to be completed sufficiently quickly. For the case of an EAST tokamak experiment, real-time equilibrium reconstruction is generally required to provide results within 1ms. A graphic processing unit(GPU) parallel Grad–Shafranov(G-S) solver is developed in P-EFIT code,which is built with the CUDA? architecture to take advantage of massively parallel GPU cores and significantly accelerate the computation. Optimization and implementation of numerical algorithms for a block tri-diagonal linear system are presented. The solver can complete a calculation within 16 μs with 65×65 grid size and 27 μs with 129×129 grid size, and this solver supports that P-EFIT can fulfill the time feasibility for real-time plasma control with both grid sizes. 展开更多
关键词 TOKAMAK Grad-Shafranov equation equilibrium reconstruction gpu parallel computation
下载PDF
Numerical simulation of stirred tanks using a hybrid immersed-boundary method 被引量:1
7
作者 Shengbin Di Ji Xu +1 位作者 Qi Chang Wei Ge 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2016年第9期1122-1134,共13页
Conventionally, multiple reference frame(MRF) method and sliding mesh(SM) method are used in the simulation of stirred tanks, however, both methods have limitations. In this study, a hybrid immersed-boundary(IB)techni... Conventionally, multiple reference frame(MRF) method and sliding mesh(SM) method are used in the simulation of stirred tanks, however, both methods have limitations. In this study, a hybrid immersed-boundary(IB)technique is developed in a finite difference context for the numerical simulation of stirred tanks. IBs based on Lagrangian markers and solid volume fractions are used for moving and stationary boundaries, respectively, to achieve optimal efficiency and accuracy. To cope with the high computational cost in the simulation of stirred tanks, the technique is implemented on computers with hybrid architecture where central processing units(CPUs) and graphics processing units(GPUs) are used together. The accuracy and efficiency of the present technique are first demonstrated in a relatively simple case, and then the technique is applied to the simulation of turbulent flow in a Rushton stirred tank with large eddy simulation(LES). Finally the proposed methodology is coupled with discrete element method(DEM) to accomplish particle-resolved simulation of solid suspensions in small stirred tanks. It demonstrates that the proposed methodology is a promising tool in simulating turbulent flow in stirred tanks with complex geometries. 展开更多
关键词 Immersed-boundary method CPU–gpu hybrid computing Stirred tank Large eddy simulation
下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
8
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Adopting GPU computing to support DL-based Earth science applications
9
作者 Zifu Wang Yun Li +5 位作者 Kevin Wang Jacob Cain Mary Salami Daniel Q.Duffy Michael M.Little Chaowei Yang 《International Journal of Digital Earth》 SCIE EI 2023年第1期2660-2680,共21页
With the advancement of Artificial Intelligence(Al)technologies and accumulation of big Earth data,Deep Learning(DL)has become an important method to discover patterns and understand Earth science processes in the pas... With the advancement of Artificial Intelligence(Al)technologies and accumulation of big Earth data,Deep Learning(DL)has become an important method to discover patterns and understand Earth science processes in the past several years.While successful in many Earth science areas,Al/DL applications are often challenging for computing devices.In recent years,Graphics Processing Unit(GPU)devices have been leveraged to speed up Al/DL applications,yet computational performance still poses a major barrier for DL-based Earth science applications.To address these computational challenges,we selected five existing sample Earth science Al applications,revised the DL-based models/algorithms,and tested the performance of multiple GPU computing platforms to support the applications.Application softwarepackages,performance comparisonsacross different platforms,along with other results,are summarized.This article can help understand how various Al/ML Earth science applications can be supported by GPU computing and help researchers in the Earth science domain better adopt GPU computing(such as supermicro,GPU clusters,and cloud computing-based)for their Al/ML applications,and to optimize their science applications to better leverage the computing device. 展开更多
关键词 gpu computing GeoAl open science Earth science artificial intelligence
原文传递
Parallelization and Acceleration of Dynamic Option Pricing Models on GPU-CPU Heterogeneous Systems
10
作者 Brian Wesley MUGANDA Bernard Shibwabo KASAMANI 《Journal of Systems Science and Information》 CSCD 2023年第5期622-635,共14页
In this paper,stochastic global optimization algorithms,specifically,genetic algorithm and simulated annealing are used for the problem of calibrating the dynamic option pricing model under stochastic volatility to ma... In this paper,stochastic global optimization algorithms,specifically,genetic algorithm and simulated annealing are used for the problem of calibrating the dynamic option pricing model under stochastic volatility to market prices by adopting a hybrid programming approach.The performance of this dynamic option pricing model under the obtained optimal parameters is also discussed.To enhance the model throughput and reduce latency,a heterogeneous hybrid programming approach on GPU was adopted which emphasized a data-parallel implementation of the dynamic option pricing model on a GPU-based system.Kernel offloading to the GPU of the compute-intensive segments of the pricing algorithms was done in OpenCL.The GPU approach was found to significantly reduce latency by an optimum of 541 times faster than a parallel implementation approach on the CPU,reducing the computation time from 46.24 minutes to 5.12 seconds. 展开更多
关键词 PARALLELIZATION gpu computing option pricing gpu acceleration stochastic volatility hybrid programming
原文传递
DEM analysis of the influence of stirrer design on die filling with forced powder feeding
11
作者 Chao Zheng Edward Yost +3 位作者 Ariel R.Muliadi Nicolin Govender Ling Zhang Chuan-Yu Wu 《Particuology》 SCIE EI CAS CSCD 2024年第5期107-115,共9页
Die filling is a critical stage during powder compaction,which can significantly affect the product quality and efficiency.In this paper,a forced feeder is introduced attempting to improve the filling performance of a... Die filling is a critical stage during powder compaction,which can significantly affect the product quality and efficiency.In this paper,a forced feeder is introduced attempting to improve the filling performance of a lab-scale die filling system.The die filling process is analysed with a graphics processing units(GPU)enhanced discrete element method(DEM).Various stirrer designs are assessed for a wide range of process settings(i.e.,stirrer speed,filling speed)to explore their influence on the die filling performance of free-flowing powder.Numerical results show that die filing with the novel helical-ribbon(i.e.,type D)stirrer design exhibits the highest filling ratio,implying that it is the most robust stirrer design for the feeder configuration considered.Furthermore,die filling performance with the type D stirrer design is a function of the stirrer speed and the filling speed.A positive variation of filling ratio(ηf>0%)can be ensured over the whole range of filling speed by adjusting the stirrer speed(i.e.,increasing the stirrer speed).The approach used in this study can not only help understand how the stirrer design affects the die filling performance but also guide the optimization of feeder system and process settings. 展开更多
关键词 Discrete element method Die filling Forced feeding Stirrer design gpu computing
原文传递
GPU-accelerated computing of three-dimensional solar wind background 被引量:8
12
作者 FENG XueShang ZHONG DingKun +1 位作者 XIANG ChangQing ZHANG Yao 《Science China Earth Sciences》 SCIE EI CAS 2013年第11期1864-1880,共17页
High-performance computational models are required to make the real-time or faster than rea^-time numerical prediction of adverse space weather events and their influence on the geospace environment. The main objectiv... High-performance computational models are required to make the real-time or faster than rea^-time numerical prediction of adverse space weather events and their influence on the geospace environment. The main objective in this article is to explore the application of programmable graphic processing units (GPUs) to the numerical space weather modeling for the study of solar wind background that is a crucial part in the numerical space weather modeling. GPU programming is realized for our Solar-Interplanetary-CESE MHD model (SIP-CESE MHD model) by numerically studying the solar corona/interplanetary so- lar wind. The global solar wind structures are obtained by the established GPU model with the magnetic field synoptic data as input. Meanwhile, the time-dependent solar surface boundary conditions derived from the method of characteristics and the mass flux limit are incorporated to couple the observation and the three-dimensional (3D) MHD model. The simulated evolu- tion of the global structures for two Carrington rotations 2058 and 2062 is compared with solar observations and solar wind measurements t^om spacecraft near the Earth. The MHD model is also validated by comparison with the standard potential field source surface (PFSS) model. Comparisons show that the MHD results are in good overall agreement with coronal and interplanetary structures, including the size and distribution of coronal holes, the position and shape of the streamer belts, and the transition of the solar wind speeds and magnetic field polarities. 展开更多
关键词 space weather modeling SIP-CESE MHD model gpu computing
原文传递
A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures 被引量:5
13
作者 Cristobal A.Navarro Nancy Hitschfeld-Kahler Luis Mateu 《Communications in Computational Physics》 SCIE 2014年第2期285-329,共45页
Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions.The evolution of computer architectures(multi-core and many... Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions.The evolution of computer architectures(multi-core and many-core)towards a higher number of cores can only confirm that parallelism is the method of choice for speeding up an algorithm.In the last decade,the graphics processing unit,or GPU,has gained an important place in the field of high performance computing(HPC)because of its low cost and massive parallel processing power.Super-computing has become,for the first time,available to anyone at the price of a desktop computer.In this paper,we survey the concept of parallel computing and especially GPU computing.Achieving efficient parallel algorithms for the GPU is not a trivial task,there are several technical restrictions that must be satisfied in order to achieve the expected performance.Some of these limitations are consequences of the underlying architecture of the GPU and the theoretical models behind it.Our goal is to present a set of theoretical and technical concepts that are often required to understand the GPU and its massive parallelism model.In particular,we show how this new technology can help the field of computational physics,especially when the problem is data-parallel.We present four examples of computational physics problems;n-body,collision detection,Potts model and cellular automata simulations.These examples well represent the kind of problems that are suitable for GPU computing.By understanding the GPU architecture and its massive parallelism programming model,one can overcome many of the technical limitations found along the way,design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations. 展开更多
关键词 gpu computing parallel computing computing models algorithms data parallel massive parallelism Potts model Ising Model collision detection N-BODY Cellular Automata.
原文传递
An MPI+OpenACC-Based PRM Scalar Advection Scheme in the GRAPES Model over a Cluster with Multiple CPUs and GPUs 被引量:2
14
作者 Huadong Xiao Yang Lu +1 位作者 Jianqiang Huang Wei Xue 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第1期164-173,共10页
A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Reg... A moisture advection scheme is an essential module of a numerical weather/climate model representing the horizontal transport of water vapor.The Piecewise Rational Method(PRM) scalar advection scheme in the Global/Regional Assimilation and Prediction System(GRAPES) solves the moisture flux advection equation based on PRM.Computation of the scalar advection involves boundary exchange,and computation of higher bandwidth requirements is complicated and time-consuming in GRAPES.Recently,Graphics Processing Units(GPUs) have been widely used to solve scientific and engineering computing problems owing to advancements in GPU hardware and related programming models such as CUDA/OpenCL and Open Accelerator(OpenACC).Herein,we present an accelerated PRM scalar advection scheme with Message Passing Interface(MPI) and OpenACC to fully exploit GPUs’ power over a cluster with multiple Central Processing Units(CPUs) and GPUs,together with optimization of various parameters such as minimizing data transfer,memory coalescing,exposing more parallelism,and overlapping computation with data transfers.Results show that about 3.5 times speedup is obtained for the entire model running at medium resolution with double precision when comparing the scheme’s elapsed time on a node with two GPUs(NVIDIA P100) and two 16-core CPUs(Intel Gold 6142).Further,results obtained from experiments of a higher resolution model with multiple GPUs show excellent scalability. 展开更多
关键词 Graphics Processing Unit(gpu)computing Open Accelerator(OpenACC) Message Passing Interface(MPI) Global/Regional Assimilation and Prediction System(GRAPES) Piecewise Rational Method(PRM)scalar advection scheme
原文传递
Implementation of Multi-GPU Based Lattice Boltzmann Method for Flow Through Porous Media 被引量:1
15
作者 Changsheng Huang Baochang Shi +1 位作者 Nanzhong He Zhenhua Chai 《Advances in Applied Mathematics and Mechanics》 SCIE 2015年第1期1-12,共12页
The lattice Boltzmann method(LBM)can gain a great amount of performance benefit by taking advantage of graphics processing unit(GPU)computing,and thus,the GPU,ormulti-GPU based LBMcan be considered as a promising and ... The lattice Boltzmann method(LBM)can gain a great amount of performance benefit by taking advantage of graphics processing unit(GPU)computing,and thus,the GPU,ormulti-GPU based LBMcan be considered as a promising and competent candidate in the study of large-scale fluid flows.However,the multi-GPU based lattice Boltzmann algorithm has not been studied extensively,especially for simulations of flow in complex geometries.In this paper,through coupling with the message passing interface(MPI)technique,we present an implementation of multi-GPU based LBM for fluid flow through porous media as well as some optimization strategies based on the data structure and layout,which can apparently reduce memory access and completely hide the communication time consumption.Then the performance of the algorithm is tested on a one-node cluster equipped with four Tesla C1060 GPU cards where up to 1732 MFLUPS is achieved for the Poiseuille flow and a nearly linear speedup with the number of GPUs is also observed. 展开更多
关键词 Lattice Boltzmann method gpu computing CUDA porous media MPI
原文传递
Implementation of the moving particle semi-implicit method on GPU 被引量:2
16
作者 ZHU XiaoSong CHENG Liang +1 位作者 LU Lin TENG Bin 《Science China(Physics,Mechanics & Astronomy)》 SCIE EI CAS 2011年第3期523-532,共10页
The Moving Particle Semi-implicit (MPS) method performs well in simulating violent free surface flow and hence becomes popular in the area of fluid flow simulation. However, the implementations of searching neighbouri... The Moving Particle Semi-implicit (MPS) method performs well in simulating violent free surface flow and hence becomes popular in the area of fluid flow simulation. However, the implementations of searching neighbouring particles and solving the large sparse matrix equations (Poisson-type equation) are very time-consuming. In order to utilize the tremendous power of parallel computation of Graphics Processing Units (GPU), this study has developed a GPU-based MPS model employing the Compute Unified Device Architecture (CUDA) on NVIDIA GTX 280. The efficient neighbourhood particle searching is done through an indirect method and the Poisson-type pressure equation is solved by the Bi-Conjugate Gradient (BiCG) method. Four different optimization levels for the present general parallel GPU-based MPS model are demonstrated. In addition, the elaborate optimization of GPU code is also discussed. A benchmark problem of dam-breaking flow is simulated using both codes of the present GPU-based MPS and the original CPU-based MPS. The comparisons between them show that the GPU-based MPS model outperforms 26 times the traditional CPU model. 展开更多
关键词 moving particle semi-implicit method (MPS) graphics processing units gpu compute unified device architecture (CUDA) neighbouring particle searching free surface flow
原文传递
Numerical Study of Geometric Multigrid Methods on CPU–GPU Heterogeneous Computers
17
作者 Chunsheng Feng Shi Shu +1 位作者 Jinchao Xu Chen-Song Zhang 《Advances in Applied Mathematics and Mechanics》 SCIE 2014年第1期1-23,共23页
.The geometric multigrid method(GMG)is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations.GMG utilizes a hierarchy of grids or discretizati... .The geometric multigrid method(GMG)is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations.GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously.Graphics processing units(GPUs)have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements.A central challenge in implementing GMG on GPUs,though,is that computational work on coarse levels cannot fully utilize the capacity of a GPU.In this work,we perform numerical studies of GMG on CPU–GPU heterogeneous computers.Furthermore,we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver,Fast Fourier Transform,in the cuFFT library developed by NVIDIA. 展开更多
关键词 High-performance computing CPU–gpu heterogeneous computers multigrid method fast Fourier transform partial differential equations.
原文传递
TensorFlow solver for quantum Page Rank in large-scale networks 被引量:1
18
作者 Hao Tang Ruoxi Shi +4 位作者 Tian-Shen He Yan-Yan Zhu Tian-Yu Wang Marcus Lee Xian-Min Jin 《Science Bulletin》 SCIE EI CSCD 2021年第2期120-126,M0003,共8页
Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking... Google Page Rank is a prevalent algorithm for ranking the significance of nodes or websites in a network,and a recent quantum counterpart for Page Rank algorithm has been raised to suggest a higher accuracy of ranking comparing to Google Page Rank.The quantum Page Rank algorithm is essentially based on quantum stochastic walks and can be expressed using Lindblad master equation,which,however,needs to solve the Kronecker products of an O(N^(4))dimension and requires severely large memory and time when the number of nodes N in a network increases above 150.Here,we present an efficient solver for quantum Page Rank by using the Runge-Kutta method to reduce the matrix dimension to O(N^(2))and employing Tensor Flow to conduct GPU parallel computing.We demonstrate its performance in solving quantum stochastic walks on Erdos-Rényi graphs using an RTX 2060 GPU.The test on the graph of 6000 nodes requires a memory of 5.5 GB and time of 223 s,and that on the graph of 1000 nodes requires 226 MB and 3.6 s.Compared with QSWalk,a currently prevalent Mathematica solver,our solver for the same graph of 1000 nodes reduces the required memory and time to only 0.2%and 0.05%.We apply the solver to quantum Page Rank for the USA major airline network with up to 922 nodes,and to quantum stochastic walk on a glued tree of 2186 nodes.This efficient solver for large-scale quantum Page Rank and quantum stochastic walks would greatly facilitate studies of quantum information in real-life applications. 展开更多
关键词 Quantum stochastic walk Quantum PageRank Lindblad master equation TensorFlow gpu parallel computing Runge-Kutta method
原文传递
Fast Parallel Cutoff Pair Interactions for Molecular Dynamics on Heterogeneous Systems
19
作者 Qiang Wu Canqun Yang +1 位作者 Tao Tang Kai Lu 《Tsinghua Science and Technology》 EI CAS 2012年第3期265-277,共13页
Heterogeneous systems with both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are frequently used to accelerate short-ranged Molecular Dynamics (MD) simulations. The most time-consuming ta... Heterogeneous systems with both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are frequently used to accelerate short-ranged Molecular Dynamics (MD) simulations. The most time-consuming task in short-ranged MD simulations is the computation of particle-to-particle interac- tions. Beyond a certain distance, these interactions decrease to zero. To minimize the operations to investi- gate distance, previous works have tiled interactions by employing the spatial attribute, which increases the memory access and GPU computations, hence decreasing performance. Other studies ignore the spatial attribute and construct an all-versus-all interaction matrix, which has poor scalability. This paper presents an improved algorithm. The algorithm first bins particles into voxels according to the spatial attributes, and then tiles the all-versus-all matrix into voxel-versus-voxel sub-matrixes. Only the sub-matrixes between neighbor- ing voxels are computed on the GPU. Therefore, the algorithm reduces the distance examine operations and limits additional memory access and GPU computations. This paper also adopts a multi-level program- ming model to implement the algorithm on multi-nodes of Tianhe-lA. By employing (1) a patch design to ex- ploit parallelism across the simulation domain, (2) a communication overlapping method to overlap the communications between CPUs and GPUs, and (3) a dynamic workload balancing method to adjust the workloads among compute nodes, the implementation achieves a speedup of 4.16x on one NVIDIA Tesla M2050 GPU compared to a 2.93 GHz six-core Intel Xeon X5670 CPU. In addition, it runs 2.41x faster on 256 compute nodes of Tianhe-lA (with two CPUs and one GPU inside a node) than on 256 GPU-excluded nodes. 展开更多
关键词 cutoff pair interactions molecular dynamics heterogeneous computing gpu computing
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部