期刊文献+
共找到289篇文章
< 1 2 15 >
每页显示 20 50 100
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
1
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
2
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (gpu computing unified device architecture (CUDA)
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
3
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 Parallel computing Image processing OPENMP Parallel Programming High Performance computing gpu (graphic processing unit)
下载PDF
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
4
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose gpu graphic processing unit
下载PDF
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
5
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(gpu) compute unified device architecture (CUDA) neural network species recognition
下载PDF
隐私计算环境下深度学习的GPU加速技术综述
6
作者 秦智翔 杨洪伟 +2 位作者 郝萌 何慧 张伟哲 《信息安全研究》 CSCD 北大核心 2024年第7期586-593,共8页
随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密... 随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密2种隐私计算技术,而后探讨了明文环境与隐私计算环境下深度学习的GPU加速技术.在明文环境下,介绍了数据并行和模型并行2种基本的深度学习并行训练模式,分析了重计算和显存交换2种不同的内存优化技术,并介绍了分布式神经网络训练过程中的梯度压缩技术.介绍了在隐私计算环境下安全多方计算和同态加密2种不同隐私计算场景下的深度学习GPU加速技术.简要分析了2种环境下GPU加速深度学习方法的异同. 展开更多
关键词 深度学习 gpu计算 隐私计算 安全多方计算 同态加密
下载PDF
基于GPU和角正交投影视图的多视角投影全息图
7
作者 曹雪梅 张春晓 +4 位作者 管明祥 夏林中 郭丽丽 苗玉虎 曹士平 《深圳大学学报(理工版)》 CAS CSCD 北大核心 2024年第5期536-541,共6页
针对多视角投影全息图生成速度慢的问题,提出一种基于计算机图形处理单元(graphics processing unit,GPU)的多视角投影计算全息图合成方法.获取多个角正交投影视图,充分利用GPU强大的并行计算能力,同时计算多幅投影视图对全息图的作用,... 针对多视角投影全息图生成速度慢的问题,提出一种基于计算机图形处理单元(graphics processing unit,GPU)的多视角投影计算全息图合成方法.获取多个角正交投影视图,充分利用GPU强大的并行计算能力,同时计算多幅投影视图对全息图的作用,即在计算过程中同时将沿着投影方向移位后的一系列角正交投影视图乘以其相应的常数相位因子.其中,每个投影图像的投影角决定了其移位的距离和常数相位因子.将所有并行计算结果累加,可以得到一个包含物体三维信息的二维复矩阵,即菲涅尔全息图.相较于使用计算机中央处理器(central processing unit,CPU)进行计算,本方法显著提升了计算速度,将计算效率提高了30~40倍,为多视角投影全息图的高效生成提供一种可行途径. 展开更多
关键词 信息处理技术 计算全息 全息显示 图形处理单元 角正交投影视图 多视角投影全息
下载PDF
面向GPU并行编程的线程同步综述
8
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPgpu) 并行编程 线程同步 性能优化
下载PDF
GPU加速下的三维快速分解后向投影SAS成像算法
9
作者 陶鸿博 张东升 黄勇 《系统工程与电子技术》 EI CSCD 北大核心 2024年第10期3247-3256,共10页
后向投影(back projection,BP)算法是一种精确的时域成像算法,但BP算法的计算复杂度高,难以实现实时性成像,特别是在考虑三维成像时,BP算法的计算复杂度会进一步增加。提出一种应用在合成孔径声纳(synthetic aperture sonar,SAS)上的三... 后向投影(back projection,BP)算法是一种精确的时域成像算法,但BP算法的计算复杂度高,难以实现实时性成像,特别是在考虑三维成像时,BP算法的计算复杂度会进一步增加。提出一种应用在合成孔径声纳(synthetic aperture sonar,SAS)上的三维快速分解BP(fast factorized BP,FFBP)成像算法,并利用图形处理器(graphics processing unit,GPU)加速三维FFBP算法。经过对点目标的测试,计算时间从原本的263 s降低到了2.3 s,解决了SAS中的三维成像实时性问题。同时,验证了所提算法在非理想航迹下的成像效果。结果表明,在添加幅度不超过0.1 m(一个波长以内)的正弦扰动时,所提算法对点目标仍有良好的聚焦效果。 展开更多
关键词 快速分解后向投影 并行计算 图形处理器 合成孔径声纳 三维成像
下载PDF
Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现
10
作者 张磊 赵光岳 +1 位作者 肖超恩 王建新 《计算机工程》 CAS CSCD 北大核心 2024年第9期208-215,共8页
近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较... 近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较多的时间和消耗较多的资源。为此,提出一种基于图形处理器(GPU)的Falcon密钥树并行生成方案。该方案使用奇偶线程联合控制的单指令多线程(SIMT)并行模式和无中间变量的直接计算模式,达到了提升速度和减少资源占用的目的。基于Python的CUDA平台进行了实验,验证结果的正确性。实验结果表明,Falcon密钥树生成在RTX 3060 Laptop的延迟为6 ms,吞吐量为167次/s,在计算单个Falcon密钥树生成部件时相对于CPU实现了1.17倍的加速比,在同时并行1024个Falcon密钥树生成部件时,GPU相对于CPU的加速比达到了约56倍,在嵌入式Jetson Xavier NX平台上的吞吐量为32次/s。 展开更多
关键词 后量子密码 Falcon算法 图形处理器 CUDA平台 并行计算
下载PDF
基于GPU的LBM迁移模块算法优化
11
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
12
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(GPgpu) MULTICORE intra-kernel inter-kernel parallel
原文传递
基于GPU加速的全源对最短路径并行算法 被引量:1
13
作者 肖汉 肖诗洋 +1 位作者 李焕勤 周清雷 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第5期1022-1032,共11页
针对最短路径算法处理大规模数据集低效的问题,提出了基于图形处理器(Graphics Processing Unit,GPU)加速的全源对最短路径并行算法.首先通过优化矩阵乘法算法实现了在工作组内和组间进行并行运算数据,然后减少了非规则行造成的工作项分... 针对最短路径算法处理大规模数据集低效的问题,提出了基于图形处理器(Graphics Processing Unit,GPU)加速的全源对最短路径并行算法.首先通过优化矩阵乘法算法实现了在工作组内和组间进行并行运算数据,然后减少了非规则行造成的工作项分支,最后降低了工作项对邻接矩阵计算条带存储资源的访问延时.实验结果表明,与基于AMD Ryzen5 1600X CPU的串行算法、基于开放多处理(Open Multi-Processing, OpenMP)并行算法和基于统一计算设备架构(Compute Unified Device Architecture, CUDA)并行算法相比,最短路径并行算法在开放式计算语言(Open Computing Language, OpenCL)架构下NVIDIA GeForce GTX 1 070计算平台上分别获得了196.35、36.76和2.25倍的加速比,验证了提出的并行优化方法的有效性和性能可移植性. 展开更多
关键词 最短路径 重复平方法 图形处理器 开放式计算语言 并行算法
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm 被引量:1
14
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation graphical processing unit(gpu)
下载PDF
GPU based numerical simulation of core shooting process
15
作者 Yi-zhong Zhang Gao-chun Lu +3 位作者 Chang-jiang Ni Tao Jing Lin-long Yang Qin-fang Wu 《China Foundry》 SCIE 2017年第5期392-397,共6页
Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, r... Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, research on numerical simulation of the core shooting process is very limited. Based on a two-fluid model(TFM) and a kinetic-friction constitutive correlation, a program for 3D numerical simulation of the core shooting process has been developed and achieved good agreements with in-situ experiments. To match the needs of engineering applications, a graphics processing unit(GPU) has also been used to improve the calculation efficiency. The parallel algorithm based on the Compute Unified Device Architecture(CUDA) platform can significantly decrease computing time by multi-threaded GPU. In this work, the program accelerated by CUDA parallelization method was developed and the accuracy of the calculations was ensured by comparing with in-situ experimental results photographed by a high-speed camera. The design and optimization of the parallel algorithm were discussed. The simulation result of a sand core test-piece indicated the improvement of the calculation efficiency by GPU. The developed program has also been validated by in-situ experiments with a transparent core-box, a high-speed camera, and a pressure measuring system. The computing time of the parallel program was reduced by nearly 95% while the simulation result was still quite consistent with experimental data. The GPU parallelization method can successfully solve the problem of low computational efficiency of the 3D sand shooting simulation program, and thus the developed GPU program is appropriate for engineering applications. 展开更多
关键词 graphics processing unit (gpu compute Unified Device Architecture (CUDA) PARALLELIZATION core shooting process
下载PDF
基于GPU的子图匹配优化技术 被引量:1
16
作者 李安腾 崔鹏杰 +1 位作者 袁野 王国仁 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2023年第9期1856-1864,共9页
提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案.提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力.采用基于候选点的连接策略,在... 提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案.提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力.采用基于候选点的连接策略,在连接阶段以最小邻居数为粒度预分配空间,设计高效的集合运算,避免传统方法重复连接的额外开销.多个数据集测试结果表明GpSI较主流GPU子图匹配算法在候选集过滤能力、执行用时、GPU内存占用和稳定性上均有明显优势.在真实数据集测试中,相比GPU友好子图匹配算法,GpSI的执行用时加速2~10倍. 展开更多
关键词 子图同构 数据挖掘 图形处理器(gpu) 并行计算 高性能计算
下载PDF
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the Second Kind Boundary Conditions in Capillary Porous Composite Cylinder Using Programmable Graphics Hardware
17
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2018年第9期24-38,共15页
With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass tr... With the recent developments in computing technology, increased efforts have gone into simulation of various scientific methods and phenomenon in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more important in analysing various scenarios in engineering applications. Analysing such heat and mass transfer phenomenon in a given environment requires us to simulate it. This entails simulation of coupled heat mass transfer equations. However, this process of numerical solution of heat and mass transfer equations is very much time consuming. Therefore, this paper aims at utilizing one of the acceleration techniques developed in the graphics community that exploits a graphics processing unit (GPU) which is applied to the numerical solutions of heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model caters a good method of applying parallel computing to program the graphical processing unit. This paper shows a good improvement in the performance while solving the heat and mass transfer equations for capillary porous composite cylinder with the second kind of boundary conditions numerically running on GPU. This heat and mass transfer simulation is implemented using CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental results depict the drastic performance improvement when GPU is used to perform heat and mass transfer simulation. GPU can significantly accelerate the performance with a maximum observed speedup of more than 7-fold times. Therefore, the GPU is a good approach to accelerate the heat and mass transfer simulation. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose graphics processing unit (GPgpu) CUDA
下载PDF
An Efficient Acceleration of Solving Heat and Mass Transfer Equations with the First Kind Boundary Conditions in Capillary Porous Radially Composite Cylinder Using Programmable Graphics Hardware
18
作者 Hira Narang Fan Wu Abdul Rafae Mohammed 《Journal of Computer and Communications》 2019年第7期267-281,共15页
With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer ... With the latest advances in computing technology, a huge amount of efforts have gone into simulation of a range of scientific phenomena in engineering fields. One such case is the simulation of heat and mass transfer in capillary porous media, which is becoming more and more necessary in analyzing a number of eventualities in science and engineering applications. However, this procedure of numerical solution of heat and mass transfer equations for capillary porous media is very time consuming. Therefore, this paper pursuit is at making use of one of the acceleration methods developed in the graphics community that exploits a graphical processing unit (GPU), which is applied to the numerical solutions of such heat and mass transfer equations. The nVidia Compute Unified Device Architecture (CUDA) programming model offers a correct approach of applying parallel computing to applications with graphical processing unit. This paper suggests a true improvement in the performance while solving the heat and mass transfer equations for capillary porous radially composite cylinder with the first type of boundary conditions. This heat and mass transfer simulation is carried out through the usage of CUDA platform on nVidia Quadro FX 4800 graphics card. Our experimental outcomes exhibit the drastic overall performance enhancement when GPU is used to illustrate heat and mass transfer simulation. GPU can considerably accelerate the performance with a maximum found speedup of more than 5-fold times. Therefore, the GPU is a good strategy to accelerate the heat and mass transfer simulation in porous media. 展开更多
关键词 Numerical Solution Heat and Mass Transfer general purpose graphics processing unit (GPgpu) CUDA
下载PDF
混沌线程池与GPU优化的批量图像加密算法 被引量:1
19
作者 潘明华 王一涵 +1 位作者 谷盛民 孙绍华 《科学技术与工程》 北大核心 2023年第34期14618-14626,共9页
数据量大且冗余度高是数字图像显著的特征,这对大批量图像快速实时加密提出了挑战。为了解决此问题,基于Lorenz混沌加密技术,设计了一种采用线程池与图形处理器(graphics processing unit,GPU)组合优化的批量图像加密算法。该算法通过... 数据量大且冗余度高是数字图像显著的特征,这对大批量图像快速实时加密提出了挑战。为了解决此问题,基于Lorenz混沌加密技术,设计了一种采用线程池与图形处理器(graphics processing unit,GPU)组合优化的批量图像加密算法。该算法通过线程池改进图像的读写,并进行图像镜像变换;利用Lorenz混沌系统生成加密序列,结合图像分块混沌序列进行加密;然后对批量图像数据进行打包,通过GPU进行大批量的异步计算;最后重组图像矩阵得到批量加密图像。实验测试表明,该算法能够有效抵御常见的攻击手段,经过性能优化后的批量数字图像加密算法,可以保证图像安全性;同时,在批量图像读取速率和加解密处理效率方面有显著的提高。 展开更多
关键词 图像加密 混沌系统 并行计算 线程池 图形处理器(graphics processing unit gpu)
下载PDF
基于GPU的天线组阵信号时延补偿方法
20
作者 毛飞龙 焦义文 +4 位作者 马宏 韩久江 高泽夫 李超 李冬 《系统工程与电子技术》 EI CSCD 北大核心 2023年第8期2383-2394,共12页
针对天线组阵合成系统对于宽带、高速、并行信号的实时合成需求,设计了基于图形处理器(graphic processing unit,GPU)的天线组阵信号时延补偿方法。首先,分析了典型的整数时延补偿方法在GPU平台上实现的可行性,设计了基于数据块重叠保... 针对天线组阵合成系统对于宽带、高速、并行信号的实时合成需求,设计了基于图形处理器(graphic processing unit,GPU)的天线组阵信号时延补偿方法。首先,分析了典型的整数时延补偿方法在GPU平台上实现的可行性,设计了基于数据块重叠保留的整数时延补偿方法。然后,对比了典型的小数时延补偿方法的优劣,设计了适合于GPU并行加速的频域小数时延补偿方法。最后,对基于GPU的天线组阵信号时延补偿方法进行了实验验证。多次实验测试结果表明,在确保时延补偿正确性的基础上,基于GPU的时延补偿方法相比传统串行CPU时延补偿方法加速比提升了约18倍,采用基于GPU的时延补偿方法可实现对多天线信号的实时合成。 展开更多
关键词 时延补偿 天线组阵 图形处理器 并行计算
下载PDF
上一页 1 2 15 下一页 到第
使用帮助 返回顶部