期刊文献+
共找到18篇文章
< 1 >
每页显示 20 50 100
利用GPGPU进行快速稀疏磁共振数据重建
1
作者 王聪 冯衍秋 《计算机工程与应用》 CSCD 北大核心 2011年第17期203-206,209,共5页
利用GPGPU(General Purpose GPU)强大的并行处理能力,基于NVIDIA CUDA框架对已有的稀疏磁共振(Sparse MRI)重建算法进行了并行化改造,使其能够适应实际应用的要求。稀疏磁共振成像的重建算法包含大量的浮点运算,计算耗时严重,难以应用... 利用GPGPU(General Purpose GPU)强大的并行处理能力,基于NVIDIA CUDA框架对已有的稀疏磁共振(Sparse MRI)重建算法进行了并行化改造,使其能够适应实际应用的要求。稀疏磁共振成像的重建算法包含大量的浮点运算,计算耗时严重,难以应用于实际,必须对其进行加速和优化。实验结果显示,NVIDIA GTX275 GPU使运算时间从4分多钟缩短到3.4秒左右,与Intel Q8200 CPU相比,达到了76倍的加速。 展开更多
关键词 通用计算图形处理器(gpgpu) 统一计算设备架构(CUDA) 并行计算 压缩传感 稀疏磁共振
下载PDF
Single-particle 3D reconstruction on specialized stream architecture and comparison with GPGPUs
2
作者 段勃 Wang Wendi +1 位作者 Tan Guangming Meng Dan 《High Technology Letters》 EI CAS 2014年第4期333-345,共13页
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the fi... The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively. 展开更多
关键词 Stream architecture general purpose graphic processing unit gpgpu field programmable gate array (FPGA) CRYO-EM
下载PDF
图形处理器(GPU)加速时域有限元的二维辐射计算 被引量:5
3
作者 刘昆 王晓斌 廖成 《电波科学学报》 EI CSCD 北大核心 2008年第1期111-114,共4页
时域有限元方法是在电磁场与微波工程领域广泛应用的方法之一。然而,时域有限元在大型机上运行时都是相当缓慢的。对时域有限元计算的硬件加速的研究已经开始进行。与同一代技术的CPU比较,目前一般用户的图形加速卡(GPU)对时域有限元的... 时域有限元方法是在电磁场与微波工程领域广泛应用的方法之一。然而,时域有限元在大型机上运行时都是相当缓慢的。对时域有限元计算的硬件加速的研究已经开始进行。与同一代技术的CPU比较,目前一般用户的图形加速卡(GPU)对时域有限元的加速可以达到CPU的近4倍左右。以OpenGL作为应用编程接口(API),使用一个标准的商业图形卡编程解决二维时域有限元的辐射问题。 展开更多
关键词 图形加速卡(gpu) 时域有限元(TD-FEM) 通用计算图形处理单元 (gpgpu)
下载PDF
一种基于冗余线程的GPU多副本容错技术 被引量:8
4
作者 贾佳 杨学军 李志凌 《计算机研究与发展》 EI CSCD 北大核心 2013年第7期1551-1562,共12页
目前随着通用GPU(general purpose computation on graphic processing units,GPGPU)性能的不断提高,利用CPU和GPU构建的异构系统已经成为高性能计算领域的研究热点.然而随着并行计算系统的不断增长,系统可靠性越来越低,已成为并行计算... 目前随着通用GPU(general purpose computation on graphic processing units,GPGPU)性能的不断提高,利用CPU和GPU构建的异构系统已经成为高性能计算领域的研究热点.然而随着并行计算系统的不断增长,系统可靠性越来越低,已成为并行计算向大规模扩展的一个不容忽视的制约因素.由于商用GPGPU容错能力较弱,所以由CPU和GPU构建的大规模异构并行系统的可靠性问题更为尖锐,尚缺乏实用的容错手段,针对这一现实问题提出了一种基于冗余线程的GPU多副本容错技术:RB-TMR(Rollback TMR),同时根据异构系统的编程模型及程序特征对这一面向异构系统的容错机制的设计实现及其编译框架进行了具体分析和描述.最后通过10个案例对此技术进行了实现并评估了其性能.这一技术为异构系统的容错技术研究提供了新的思路,具有重大意义. 展开更多
关键词 通用gpu 异构系统 冗余线程 容错技术 多副本
下载PDF
GPU的通用计算应用研究 被引量:24
5
作者 张浩 李利军 林岚 《计算机与数字工程》 2005年第12期60-62,98,共4页
由于图形处理器(GPU)最近几年迅速发展,国内外学者已经将基于GPU的通用计算作为一个新的研究领域。本文在研究国外最新文献的基础上,分析了GPU本身的特性,阐明了基于GPU的应用程序的结构,研究了GPU在编程方法上与普通CPU的差别,并以高... 由于图形处理器(GPU)最近几年迅速发展,国内外学者已经将基于GPU的通用计算作为一个新的研究领域。本文在研究国外最新文献的基础上,分析了GPU本身的特性,阐明了基于GPU的应用程序的结构,研究了GPU在编程方法上与普通CPU的差别,并以高斯滤波为实例详细描述了GPU编程的方法和过程。 展开更多
关键词 gpu gpgpu 通用计算
下载PDF
SOLVERS FOR SYSTEMS OF LARGE SPARSE LINEAR AND NONLINEAR EQUATIONS BASED ON MULTI-GPUS 被引量:3
6
作者 刘沙 钟诚文 陈效鹏 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 2011年第3期300-308,共9页
Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremend... Numerical treatment of engineering application problems often eventually results in a solution of systems of linear or nonlinear equations.The solution process using digital computational devices usually takes tremendous time due to the extremely large size encountered in most real-world engineering applications.So,practical solvers for systems of linear and nonlinear equations based on multi graphic process units(GPUs)are proposed in order to accelerate the solving process.In the linear and nonlinear solvers,the preconditioned bi-conjugate gradient stable(PBi-CGstab)method and the Inexact Newton method are used to achieve the fast and stable convergence behavior.Multi-GPUs are utilized to obtain more data storage that large size problems need. 展开更多
关键词 general purpose graphic process unit(gpgpu compute unified device architecture(CUDA) system of linear equations system of nonlinear equations Inexact Newton method bi-conjugate gradient stable(Bi-CGstab)method
下载PDF
GPU加速的分段Top-k查询算法 被引量:1
7
作者 黄玉龙 邹循进 +1 位作者 刘奎 苏本跃 《计算机应用》 CSCD 北大核心 2014年第11期3112-3116,共5页
现有Top-k查询优化算法无法充分利用图形处理器(GPU)强大的并行吞吐量及时获取查询结果,为此提出了一种基于统一计算设备架构(CUDA)模型的大规模分段查询算法。通过划分查询过程以及采用分段并行处理策略,该算法可最大限度地提升查询过... 现有Top-k查询优化算法无法充分利用图形处理器(GPU)强大的并行吞吐量及时获取查询结果,为此提出了一种基于统一计算设备架构(CUDA)模型的大规模分段查询算法。通过划分查询过程以及采用分段并行处理策略,该算法可最大限度地提升查询过程中的计算和比较效率。实验结果表明,与4线程多核优化算法相比,所提算法具有明显的性能优势,当有序列表数量为6,遍历步长为120时,性能达到最优,此时比多核算法快40倍。 展开更多
关键词 TOP-K查询 通用计算图形处理器 分段处理 并行优化 禁止随机访问
下载PDF
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
8
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(gpgpu) MULTICORE intra-kernel inter-kernel parallel
原文传递
地震叠前时间偏移的一种图形处理器提速实现方法 被引量:74
9
作者 李博 刘国峰 刘洪 《地球物理学报》 SCIE EI CAS CSCD 北大核心 2009年第1期245-252,共8页
新近发展的图形处理器(GPU,Graphic Processing Unit)通用计算技术,现已日趋实用成型,并获得诸多应用领域的广泛关注.对油气勘探专项资料处理技术的运用而言,概因GPU与中央处理器(CPU)的计算性能的甚大差异,致使GPU这一通用计算技... 新近发展的图形处理器(GPU,Graphic Processing Unit)通用计算技术,现已日趋实用成型,并获得诸多应用领域的广泛关注.对油气勘探专项资料处理技术的运用而言,概因GPU与中央处理器(CPU)的计算性能的甚大差异,致使GPU这一通用计算技术在石油工业中的应用研究正在有效开展.本文仅借助于油气勘探中广泛使用的叠前时间偏移,旨在于扼要阐明其基于GPU应用的有效性;文中还提出一种利用GPU实现地震叠前时间偏移的软件构件方法,并针对非对称走时叠前时间偏移所拓展的应用软件提供一种具体实现架构.与以往用个人计算机(PC,Personal Computer)或者PC集群所用的叠前时间偏移相比,本文方法可甚大地提高计算效率,从而在石油物探资料处理中可显著地节约计算成本和维护费用.文中实际例证也表明,基于GPU进行高性能并行计算,当是适应目前石油工业中大规模计算需求的一个重要发展途径. 展开更多
关键词 非对称走时叠前时间偏移 图形处理器 gpu通用计算 统一计算设备架构
下载PDF
基于OpenCL的并行方腔流加速性能分析 被引量:8
10
作者 李森 李新亮 +2 位作者 王龙 陆忠华 迟学斌 《计算机应用研究》 CSCD 北大核心 2011年第4期1401-1403,1421,共4页
提出了一种使用OpenCL技术对方腔流问题进行加速计算的方法。在计算方腔流问题时,将其转换为N-S方程通过空间有限差分和龙格库塔时间差分求解,并使用局部缓存等技术进行GPU优化。实验在NVIDIA和ATI平台对所给算法进行评测。结果显示,Ope... 提出了一种使用OpenCL技术对方腔流问题进行加速计算的方法。在计算方腔流问题时,将其转换为N-S方程通过空间有限差分和龙格库塔时间差分求解,并使用局部缓存等技术进行GPU优化。实验在NVIDIA和ATI平台对所给算法进行评测。结果显示,OpenCL相对其串行版本加速约30倍左右。 展开更多
关键词 显卡通用计算 计算流体力学 方腔流 有限差分计算
下载PDF
图形处理器在数据库技术中的应用 被引量:4
11
作者 杨珂 罗琼 石教英 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2009年第8期1349-1360,共12页
综述了图形处理器上的通用计算(GPGPU)技术以及利用图形处理器(GPU)进行数据库处理的工作.将GPU技术的发展划分为固定功能架构、分离渲染架构和统一渲染架构3个时代,归纳了GPGPU技术的难点和现状.对于3个时代的GPU,分别论述其体系结构... 综述了图形处理器上的通用计算(GPGPU)技术以及利用图形处理器(GPU)进行数据库处理的工作.将GPU技术的发展划分为固定功能架构、分离渲染架构和统一渲染架构3个时代,归纳了GPGPU技术的难点和现状.对于3个时代的GPU,分别论述其体系结构带来的机会与存在的局限,提出了相应的通用计算模型.综述了GPU用于数据库领域的相关研究,这些应用包括谓词、布尔组合和聚集、排序、连接、多维索引等.根据GPU技术的推动因素展望了GPGPU技术的趋势,归纳了GPU技术可以被利用的3个层面:图形流水线和通用并行计算、交互式多媒体、图形学理论与方法.以数据库技术为例展望了在每个层面上通用计算的趋势. 展开更多
关键词 图形处理器 通用计算 数据库技术
下载PDF
基于CUDA的细粒度并行计算模型研究 被引量:1
12
作者 肖汉 肖波 +1 位作者 冯娜 杨锦锦 《计算机与数字工程》 2013年第5期801-804,共4页
作为应用软件模型和计算机硬件之间的桥梁,编程模型在计算机领域的重要性不言而喻。但随着具备细粒度并行计算能力的图形处理器(GPU)进入主流市场,与之相适应的编程模型发展却相对滞后。Nvidia在GeForce 8系列显卡上推出的统一计算设备... 作为应用软件模型和计算机硬件之间的桥梁,编程模型在计算机领域的重要性不言而喻。但随着具备细粒度并行计算能力的图形处理器(GPU)进入主流市场,与之相适应的编程模型发展却相对滞后。Nvidia在GeForce 8系列显卡上推出的统一计算设备架构(CUDA)技术,使得通用计算图形处理单元(GPGPU)从图形硬件流水线和高级绘制语言中解放出来,开发人员无须掌握图形学编程方法即可在单任务多数据模式(SIMD)下完成高性能并行计算。论文从特性、组成和并行架构等几个方面对CUDA并行计算模型进行了研究,充分表明基于GPU进行高性能并行计算,是适应目前大规模计算需求的一个重要发展途径。 展开更多
关键词 图形处理器 图形处理器的通用计算 统一计算设备架构 细粒度并行计算 单指令多数据
下载PDF
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
13
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose gpu graphic processing unit
下载PDF
GPGPU Accelerated Fast Convolution Back-Projection for Radar Image Reconstruction
14
作者 周斌 彭应宁 +1 位作者 叶春茂 汤俊 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第3期256-263,共8页
This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the pro... This paper describes a parallel fast convolution back-projection algorithm design for radar image reconstruction. State-of-the-art general purpose graphic processing units (GPGPU) were utilized to accelerate the processing. The implementation achieves much better performance than conventional processing systems, with a speedup of more than 890 times on NVIDIA Tesla C1060 supercomputing cards compared to an Intel P4 2.4 GHz CPU. 256×256 pixel images could be reconstructed within 6.3 s, which makes real-time imaging possible. Six platforms were tested and compared. The results show that the GPGPU super-computing system has great potential for radar image processing. 展开更多
关键词 convolution back projection (CBP) synthetic aperture radar (SAR) inverse synthetic aperture radar (ISAR) general purpose graphic processing units gpgpu
原文传递
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the First Kind Boundary Conditions in Capillary Porous Radially Composite Cylinder Using Programmable Graphics Hardware
15
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2019年第7期267-281,共15页
With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer ... With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more necessary in analyzing a number of eventualities in science and engineering applications. However, this procedure of numerical solution of heat and mass transfer equations for capillary porous media is very time consuming. Therefore, this paper pursuit is at making use of one of the acceleration methods developed in the graphics community that exploits a graphical processing unit (GPU), which is applied to the numerical solutions of such heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model offers a correct approach of applying parallel computing to applications with graphical processing unit. This paper suggests a true improvement in the performance while solving the heat and mass transfer equations for capillary porous radially composite cylinder with the first type of boundary conditions. This heat and mass transfer simulation is carried out through the usage of CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental outcomes exhibit the drastic overall performance enhancement when GPU is used to illustrate heat and mass transfer simulation. GPU can considerably accelerate the performance with a maximum found speedup of more than 5-fold times. Therefore, the GPU is a good strategy to accelerate the heat and mass transfer simulation in porous media. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose GRAPHICS Processing Unit (gpgpu) CUDA
下载PDF
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the Second Kind Boundary Conditions in Capillary Porous Composite Cylinder Using Programmable Graphics Hardware
16
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2018年第9期24-38,共15页
With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass tr... With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more important in analysing various scenarios in engineering applications. Analysing such heat and mass transfer phenomenon in a given environment requires us to simulate it. This entails simulation of coupled heat mass transfer equations. However, this process of numerical solution of heat and mass transfer equations is very much time consuming. Therefore, this paper aims at utilizing one of the acceleration techniques developed in the graphics community that exploits a graphics processing unit (GPU) which is applied to the numerical solutions of heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model caters a good method of applying parallel computing to program the graphical processing unit. This paper shows a good improvement in the performance while solving the heat and mass transfer equations for capillary porous composite cylinder with the second kind of boundary conditions numerically running on GPU. This heat and mass transfer simulation is implemented using CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental results depict the drastic performance improvement when GPU is used to perform heat and mass transfer simulation. GPU can significantly accelerate the performance with a maximum observed speedup of more than 7-fold times. Therefore, the GPU is a good approach to accelerate the heat and mass transfer simulation. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose GRAPHICS Processing Unit (gpgpu) CUDA
下载PDF
Accelerating geospatial analysis on GPUs using CUDA 被引量:1
17
作者 Ying-jie XIA Li KUANG Xiu-mei LI 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第12期990-999,共10页
Inverse distance weighting (IDW) interpolation and viewshed are two popular algorithms for geospatial analysis.IDW interpolation assigns geographical values to unknown spatial points using values from a usually scatte... Inverse distance weighting (IDW) interpolation and viewshed are two popular algorithms for geospatial analysis.IDW interpolation assigns geographical values to unknown spatial points using values from a usually scattered set of known points,and viewshed identifies the cells in a spatial raster that can be seen by observers.Although the implementations of both algorithms are available for different scales of input data,the computation for a large-scale domain requires a mass amount of cycles,which limits their usage.Due to the growing popularity of the graphics processing unit (GPU) for general purpose applications,we aim to accelerate geospatial analysis via a GPU based parallel computing approach.In this paper,we propose a generic methodological framework for geospatial analysis based on GPU and its programming model Compute Unified Device Architecture (CUDA),and explore how to map the inherent parallelism degrees of IDW interpolation and viewshed to the framework,which gives rise to a high computational throughput.The CUDA-based implementations of IDW interpolation and viewshed indicate that the architecture of GPU is suitable for parallelizing the algorithms of geospatial analysis.Experimental results show that the CUDA-based implementations running on GPU can lead to dataset dependent speedups in the range of 13-33-fold for IDW interpolation and 28-925-fold for viewshed analysis.Their computation time can be reduced by an order of magnitude compared to classical sequential versions,without losing the accuracy of interpolation and visibility judgment. 展开更多
关键词 general purpose gpu CUDA Geospatial analysis PARALLELIZATION
原文传递
A multi-scale architecture for multi-scale simulation and its application to gas-solid flows 被引量:1
18
作者 Bo Li Guofeng Zhou +4 位作者 Wei Ge Limin Wang Xiaowei Wang Li Guo Jinghai Li 《Particuology》 SCIE EI CAS CSCD 2014年第4期160-169,共10页
A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpos... A multi-scale hardware and software architecture implementing the EMMS (energy-minimization multi-scale) paradigm is proven to be effective in the simulation of a two-dimensional gas-solid suspension. General purpose CPUs are employed for macro-scale control and optimization, and many integrated cores (MlCs) operating in multiple-instruction multiple-data mode are used for a molecular dynamics simulation of the solid particles at the meso-scale. Many cores operating in single-instruction multiple- data mode, such as general purpose graphics processing units (GPGPUs), are employed for direct numerical simulation of the fluid flow at the micro-scale using the lattice Boltzmann method. This architecture is also expected to be efficient for the multi-scale simulation of other comolex systems. 展开更多
关键词 general purpose graphics processing unit(gpgpu)Many integrated core (MIC)Meso-science Multiple-instruction multiple-dataSingle-instruction multiple-dataVirtual process engineering
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部