期刊文献+
共找到377篇文章
< 1 2 19 >
每页显示 20 50 100
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
1
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(GPU) GPU parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
2
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (GPU) computing unified device architecture (CUDA)
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
3
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 Parallel computing Image processing OPENMP Parallel Programming High Performance computing GPU (graphic processing unit)
下载PDF
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
4
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(GPU) compute unified device architecture (CUDA) neural network species recognition
下载PDF
Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units
5
作者 Stuart D.C.Walsh Martin O.Saar 《Communications in Computational Physics》 SCIE 2013年第3期867-879,共13页
Lattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior.These methods are well suited to parallel implementation,particularly on the sin... Lattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior.These methods are well suited to parallel implementation,particularly on the single-instruction multiple data(SIMD)parallel processing environments found in computer graphics processing units(GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written,the programming environment still lacks the flexibility available to more traditional CPU programs.In particular,it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations.It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package-LBHydra.The performance of the automatically generated code is compared to equivalent purpose written codes for both single-phase,multiphase,and multicomponent flows.The flexibility of the new method is demonstrated by simulating a rising,dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines. 展开更多
关键词 Lattice-Boltzmann methods graphics processing units computational fluid dynamics
原文传递
Parallelizing maximum likelihood classification on computer cluster and graphics processing unit for supervised image classification
6
作者 Xuan Shi Bowei Xue 《International Journal of Digital Earth》 SCIE EI 2017年第7期737-748,共12页
Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image proc... Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image processing solutions are required to handle large scale of data.This paper introduces how maximum likelihood classification approach is parallelized for implementation on a computer cluster and a graphics processing unit to achieve high performance when processing big imagery data.The solution is scalable and satisfies the need of change detection,object identification,and exploratory analysis on large-scale high-resolution imagery data in remote sensing applications. 展开更多
关键词 Maximum likelihood classification supervised classification parallel computing graphics processing unit
原文传递
Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units 被引量:6
7
作者 XIONG QinGang LI Bo +5 位作者 XU Ji FANG XiaoJian WANG XiaoWei WANG LiMin HE XianFeng GE Wei 《Chinese Science Bulletin》 SCIE EI CAS 2012年第7期707-715,共9页
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a s... Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods. 展开更多
关键词 格子BOLTZMANN方法 图形处理单元 并行算法 集群 COUETTE流 LBM模拟 OPENMP 直接数值模拟
原文传递
Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit 被引量:2
8
作者 Ke-shi GE Hua-you SU +1 位作者 Dong-sheng LI Xi-cheng LU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第7期915-927,共13页
基于密度峰值的聚类方法 DP(density peak)由于其新颖有效的特点而广泛应用于科学研究。然而,当确定集群中心时,DP会对每对数据点操作多次,从而导致较高的计算复杂度。在本文中,我们提出了一种基于GPU(graphics processing unit)的高效... 基于密度峰值的聚类方法 DP(density peak)由于其新颖有效的特点而广泛应用于科学研究。然而,当确定集群中心时,DP会对每对数据点操作多次,从而导致较高的计算复杂度。在本文中,我们提出了一种基于GPU(graphics processing unit)的高效并行密度峰值算法。我们分析密度峰值聚类算法的原理来研究其计算瓶颈,并评估其并行的潜力。根据分析,我们提出了CUDA-DP(compute unified device architecture-DP),一种针对GPU架构的高效并行密度峰值聚类算法,并用CUDA实现了这种并行方法。具体来说,我们使用共享内存减少了全局内存访问量。更进一步,为了利用GPU的合并访问机制,我们将CUDA-DP程序的数据结构从AOS(array of structures)重构为SOA(structure of arrays)。另外,我们分别引入二进制搜索方法和采样方法,以避免对距离矩阵进行排序造成的计算开销。实验结果表明,与基于CPU的密度峰值实现相比,CUDA-DP可以实现超过45倍的加速。 展开更多
关键词 GPU 密度峰值 聚类 并行计算
原文传递
隐私计算环境下深度学习的GPU加速技术综述
9
作者 秦智翔 杨洪伟 +2 位作者 郝萌 何慧 张伟哲 《信息安全研究》 CSCD 北大核心 2024年第7期586-593,共8页
随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密... 随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密2种隐私计算技术,而后探讨了明文环境与隐私计算环境下深度学习的GPU加速技术.在明文环境下,介绍了数据并行和模型并行2种基本的深度学习并行训练模式,分析了重计算和显存交换2种不同的内存优化技术,并介绍了分布式神经网络训练过程中的梯度压缩技术.介绍了在隐私计算环境下安全多方计算和同态加密2种不同隐私计算场景下的深度学习GPU加速技术.简要分析了2种环境下GPU加速深度学习方法的异同. 展开更多
关键词 深度学习 GPU计算 隐私计算 安全多方计算 同态加密
下载PDF
面向GPU并行编程的线程同步综述
10
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPGPU) 并行编程 线程同步 性能优化
下载PDF
gEdge:基于容器技术的云边协同的异构计算框架
11
作者 汪沄 汤冬劼 +2 位作者 郭开诚 戚正伟 管海兵 《计算机学报》 EI CAS CSCD 北大核心 2024年第8期1883-1900,共18页
由于按需灵活配置、高可用性、高资源利用率等优点,云计算技术成为过去十年的主流计算范式.随着万物互联时代的到来,单独依赖云计算技术已经无法满足数以亿计的IoT设备及其数据流量的需求.边缘计算可以被看作是云计算的进化,它因5G网络... 由于按需灵活配置、高可用性、高资源利用率等优点,云计算技术成为过去十年的主流计算范式.随着万物互联时代的到来,单独依赖云计算技术已经无法满足数以亿计的IoT设备及其数据流量的需求.边缘计算可以被看作是云计算的进化,它因5G网络和物联网的崛起而诞生.随着云游戏、VR技术以及人工智能技术在日常生活中的广泛运用,对计算资源的需求也在日渐增长.然而,受体积与功耗限制,处于边缘的节点设备算力较弱.本文提出了gEdge:一种基于容器技术的云边协同的异构计算框架.该框架通过GPU虚拟化技术,将云端的物理GPU资源分为多块虚拟GPU资源,按需为边缘节点提供GPU算力资源,并且对用户容器无感知.实验表明,使用gEdge框架使边缘节点使用的容器镜像体积降低了48.8%,容器启动时间降低了35.5%,平均相对运行速度提高了213%. 展开更多
关键词 图形处理器 虚拟化技术 容器技术 边缘计算 云边协同
下载PDF
基于异构平台的图像中值滤波的OpenCL加速算法 被引量:1
12
作者 肖诗洋 王镭 +1 位作者 杜莹 肖汉 《河北大学学报(自然科学版)》 CAS 北大核心 2024年第1期92-103,共12页
图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,... 图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,对图像中值滤波功能模块进行了并行优化,降低了算法复杂度.通过充分激活NDRange索引空间中的工作组和工作项来提高数据访问效率,优化内核工作组配置参数,实现了中值滤波器的并行处理.实验结果表明,在图像质量保持不变的情况下,与基于CPU的串行算法、基于开放多处理(OpenMP)并行算法和基于统一计算设备架构(CUDA)并行算法性能相比,图像中值滤波并行算法在OpenCL架构下NVIDIA GPU计算平台上分别获得了29.74、17.29、1.15倍的加速比.验证了算法的有效性和平台的可移植性,基本满足应用的实时性处理要求. 展开更多
关键词 中值滤波 椒盐噪声 图形处理器 开放式计算语言 并行算法
下载PDF
基于GPU的LBM迁移模块算法优化
13
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
基于GPU的实景三维模型裁剪算法研究
14
作者 马东岭 李铭通 朱悦凯 《山东建筑大学学报》 2024年第1期108-116,共9页
图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Un... 图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Unified Device Architecture,CUDA)技术在实景三维模型裁剪中的应用,提出了一种基于GPU的实景三维模型裁剪算法,包括设计了基于面拓扑的多级索引结构,以实现线程内重复交点快速查找;提出了一种轻量多边形三角化方法,优化算法流程;使用多种优化策略,在不影响裁剪网格质量的情况下进一步提高算法的性能。结果表明:根据模型大小与裁剪次数的不同,相较于传统算法,所提方法在单次裁剪的情况下加速比可达13.93,在多次裁剪的情况下加速比可达35.85,显著地提高了模型的裁剪效率。 展开更多
关键词 图形处理器 实景三维模型 三角网裁剪 并行计算
下载PDF
人工智能芯片技术演进与发展策略研究
15
作者 宋艳飞 孙佳琪 王睿哲 《信息技术与标准化》 2024年第6期94-98,共5页
为加快抢占人工智能技术高地,围绕人工智能芯片产业需求,阐述了人工智能芯片技术特点、技术路线和发展趋势,并分析了人工智能芯片与先进制造、软件和整机应用等产业生态深度耦合态势。根据人工智能芯片技术与产业发展特点,提出了强化统... 为加快抢占人工智能技术高地,围绕人工智能芯片产业需求,阐述了人工智能芯片技术特点、技术路线和发展趋势,并分析了人工智能芯片与先进制造、软件和整机应用等产业生态深度耦合态势。根据人工智能芯片技术与产业发展特点,提出了强化统筹协调、释放市场优势、深化国际合作策略建议,以期推动我国人工智能芯片高质量发展。 展开更多
关键词 人工智能芯片 图形处理器 统一计算设备架构
下载PDF
Graphic Processing Unit Based Phase Retrieval and CT Reconstruction for Differential X-Ray Phase Contrast Imaging
16
作者 陈晓庆 王宇杰 孙建奇 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第5期550-554,共5页
Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of ... Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of phase-contrast imaging, the grating-based phase contrast imaging has been widely accepted owing to the advantage of wide range of sample selections and exemption of coherent source. However, the downside is the substantially larger amount of data generated from the phase-stepping method which slows down the reconstruction process. Graphic processing unit(GPU) has the advantage of allowing parallel computing which is very useful for large quantity data processing. In this paper, a compute unified device architecture(CUDA) C program based on GPU is introduced to accelerate the phase retrieval and filtered back projection(FBP) algorithm for grating-based tomography. Depending on the size of the data, the CUDA C program shows different amount of speed-up over the standard C program on the same Visual Studio 2010 platform. Meanwhile, the speed-up ratio increases as the size of data increases. 展开更多
关键词 grating-based phase contrast imaging parallel computing graphic processing unit(GPU) compute unified device architecture(CUDA) filtered back projection(FBP)
原文传递
Optimizing photoacoustic image reconstruction using cross-platform parallel computation
17
作者 Tri Vu Yuehang Wang Jun Xia 《Visual Computing for Industry,Biomedicine,and Art》 2018年第1期12-17,共6页
Three-dimensional(3D)image reconstruction involves the computations of an extensive amount of data that leads to tremendous processing time.Therefore,optimization is crucially needed to improve the performance and eff... Three-dimensional(3D)image reconstruction involves the computations of an extensive amount of data that leads to tremendous processing time.Therefore,optimization is crucially needed to improve the performance and efficiency.With the widespread use of graphics processing units(GPU),parallel computing is transforming this arduous reconstruction process for numerous imaging modalities,and photoacoustic computed tomography(PACT)is not an exception.Existing works have investigated GPU-based optimization on photoacoustic microscopy(PAM)and PACT reconstruction using compute unified device architecture(CUDA)on either C++or MATLAB only.However,our study is the first that uses cross-platform GPU computation.It maintains the simplicity of MATLAB,while improves the speed through CUDA/C++−based MATLAB converted functions called MEXCUDA.Compared to a purely MATLAB with GPU approach,our cross-platform method improves the speed five times.Because MATLAB is widely used in PAM and PACT,this study will open up new avenues for photoacoustic image reconstruction and relevant real-time imaging applications. 展开更多
关键词 Photoacoustic computed tomography graphics processing units Parallel computation Focal-line backprojection algorithm MATLAB Optical imaging
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm
18
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation graphical processing unit(GPU)
下载PDF
A Computational Comparison of Basis Updating Schemes for the Simplex Algorithm on a CPU-GPU System
19
作者 Nikolaos Ploskas Nikolaos Samaras 《American Journal of Operations Research》 2013年第6期497-505,共9页
The computation of the basis inverse is the most time-consuming step in simplex type algorithms. This inverse does not have to be computed from scratch at any iteration, but updating schemes can be applied to accelera... The computation of the basis inverse is the most time-consuming step in simplex type algorithms. This inverse does not have to be computed from scratch at any iteration, but updating schemes can be applied to accelerate this calculation. In this paper, we perform a computational comparison in which the basis inverse is computed with five different updating schemes. Then, we propose a parallel implementation of two updating schemes on a CPU-GPU System using MATLAB and CUDA environment. Finally, a computational study on randomly generated full dense linear programs is preented to establish the practical value of GPU-based implementation. 展开更多
关键词 SIMPLEX Algorithm BASIS INVERSE graphics processing unit MATLAB compute UNIFIED Device Architecture
下载PDF
GPU based numerical simulation of core shooting process
20
作者 Yi-zhong Zhang Gao-chun Lu +3 位作者 Chang-jiang Ni Tao Jing Lin-long Yang Qin-fang Wu 《China Foundry》 SCIE 2017年第5期392-397,共6页
Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, r... Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, research on numerical simulation of the core shooting process is very limited. Based on a two-fluid model(TFM) and a kinetic-friction constitutive correlation, a program for 3D numerical simulation of the core shooting process has been developed and achieved good agreements with in-situ experiments. To match the needs of engineering applications, a graphics processing unit(GPU) has also been used to improve the calculation efficiency. The parallel algorithm based on the Compute Unified Device Architecture(CUDA) platform can significantly decrease computing time by multi-threaded GPU. In this work, the program accelerated by CUDA parallelization method was developed and the accuracy of the calculations was ensured by comparing with in-situ experimental results photographed by a high-speed camera. The design and optimization of the parallel algorithm were discussed. The simulation result of a sand core test-piece indicated the improvement of the calculation efficiency by GPU. The developed program has also been validated by in-situ experiments with a transparent core-box, a high-speed camera, and a pressure measuring system. The computing time of the parallel program was reduced by nearly 95% while the simulation result was still quite consistent with experimental data. The GPU parallelization method can successfully solve the problem of low computational efficiency of the 3D sand shooting simulation program, and thus the developed GPU program is appropriate for engineering applications. 展开更多
关键词 图形处理单位(GPU ) 计算统一设备建筑学(CUDA ) 并行化 核心射击过程 TP391.99
下载PDF
上一页 1 2 19 下一页 到第
使用帮助 返回顶部