期刊文献+
共找到801篇文章
< 1 2 41 >
每页显示 20 50 100
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
1
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel computing Image processing OPENMP parallel Programming High Performance computing gpu (graphic processing unit)
下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
2
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Using Graphics Processing Units to Parallelize the FDK Algorithm for Tomographic Image Reconstruction
3
作者 Joel Sancnchez Dominguez Luiz Femando de Oliveira +1 位作者 Nilton Alves Junior Joaquim Teixeira de Assis 《Journal of Chemistry and Chemical Engineering》 2012年第8期760-768,共9页
关键词 图形处理单元 FDK算法 并行化 断层图像 GEFORCE QUADRO gpu 断层扫描
下载PDF
The inversion of density structure by graphic processing unit(GPU) and identification of igneous rocks in Xisha area 被引量:1
4
作者 Lei Yu Jian Zhang +2 位作者 Wei Lin Rongqiang Wei Shiguo Wu 《Earthquake Science》 2014年第1期117-125,共9页
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig... Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration. 展开更多
关键词 Xisha area Organic reefs and igneous rocks -Frequency decomposition of potential field 3D inversionof the graphic processing unit (gpu parallel processing
下载PDF
TIME-DOMAIN INTERPOLATION ON GRAPHICS PROCESSING UNIT 被引量:1
5
作者 XIQI LI GUOHUA SHI YUDONG ZHANG 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2011年第1期89-95,共7页
The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get ... The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized. 展开更多
关键词 Optical coherence tomography real-time signal processing graphics processing unit gpu CUDA
下载PDF
Graphical Processing Unit Based Time-Parallel Numerical Method for Ordinary Differential Equations 被引量:1
6
作者 Sumathi Lakshmiranganatha Suresh S. Muknahallipatna 《Journal of Computer and Communications》 2020年第2期39-63,共25页
On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involv... On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involves computing the solutions of the algebraic equations modeling the grid network and the ordinary differential equations modeling the dynamics of the electrical components like synchronous generators, exciters, governors, etc., of the grid in near real-time. In this research, we investigate the use of time-parallel approach in particular the Parareal algorithm implementation on Graphical Processing Unit using Compute Unified Device Architecture to compute solutions of ordinary differential equations. The numerical solution accuracy and computation time of the Parareal algorithm executing on the GPU are demonstrated on the single machine infinite bus test system. Two types of dynamic model of the single synchronous generator namely the classical and detailed models are studied. The numerical solutions of the ordinary differential equations computed by the Parareal algorithm are compared to that computed using the modified Euler’s method demonstrating the accuracy of the Parareal algorithm executing on GPU. Simulations are performed with varying numerical integration time steps, and the suitability of Parareal algorithm in computing near real-time solutions of ordinary different equations is presented. A speedup of 25× and 31× is achieved with the Parareal algorithm for classical and detailed dynamic models of the synchronous generator respectively compared to the sequential modified Euler’s method. The weak scaling efficiency of the Parareal algorithm when required to solve a large number of ordinary differential equations at each time step due to the increase in sequential computations and associated memory transfer latency between the CPU and GPU is discussed. 展开更多
关键词 Time-parallel DIFFERENTIAL Equation Numerical Integration graphic processing unit
下载PDF
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
7
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (gpu computing unified device architecture (CUDA)
下载PDF
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
8
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose gpu graphic processing unit
下载PDF
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
9
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(gpu) compute unified device architecture (CUDA) neural network species recognition
下载PDF
Graphic Processing Unit-Accelerated Mutual Information-Based 3D Image Rigid Registration
10
作者 李冠华 欧宗瑛 +1 位作者 苏铁明 韩军 《Transactions of Tianjin University》 EI CAS 2009年第5期375-380,共6页
Mutual information(MI)-based image registration is effective in registering medical images,but it is computationally expensive.This paper accelerates MI-based image registration by dividing computation of mutual infor... Mutual information(MI)-based image registration is effective in registering medical images,but it is computationally expensive.This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation,and performing 3D spatial transformation and trilinear interpolation on graphic processing unit(GPU) .The 3D floating image is downloaded to GPU as flat 3D texture,and then fetched and interpolated for each new voxel location in fragment shader.The transformed re-sults are rendered to textures by using frame buffer object(FBO) extension,and then read to the main memory used for the remaining computation on CPU.Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU. 展开更多
关键词 图形处理单元 三维图像 注册登记 加速比 互信息 基础 刚性 线性插值
下载PDF
融合GPU的拟单层覆盖近似集计算方法
11
作者 吴正江 吕成功 王梦松 《计算机工程》 CAS CSCD 北大核心 2024年第5期71-82,共12页
拟单层覆盖粗糙集是一种匹配集值信息系统且有高质量和高效率的粗糙集模型。拟单层覆盖近似集的计算过程中存在大量计算密集且逻辑简单的运算,为此,提出拟单层覆盖近似集的矩阵化表示方法,以利用图形处理器(GPU)强大的计算性能加速计算... 拟单层覆盖粗糙集是一种匹配集值信息系统且有高质量和高效率的粗糙集模型。拟单层覆盖近似集的计算过程中存在大量计算密集且逻辑简单的运算,为此,提出拟单层覆盖近似集的矩阵化表示方法,以利用图形处理器(GPU)强大的计算性能加速计算过程。为了实现这一目标,使用布尔矩阵表示拟单层覆盖近似空间中的元素,引入与集合运算对应的布尔矩阵算子,提出拟单层覆盖粗糙近似集(DE、DA、DE0与DA0)的矩阵表示,并设计矩阵化拟单层覆盖近似集算法(M_SMC)。同时,相应的定理证明了拟单层覆盖近似集的矩阵表示形式与原始定义的等价性。然而,M_SMC运行过程中出现了矩阵存储和计算步骤的内存消耗过多问题。为了将算法部署到显存有限的GPU上,优化矩阵存储和计算步骤,提出分批处理的矩阵化拟单层覆盖近似集算法(BM_SMC)。在10个数据集上的实验结果表明,融合GPU的BM_SMC算法与单纯使用中央处理器(CPU)的BM_SMC算法相比计算效率提高2.16~11.3倍,BM_SMC算法可以在有限的存储空间条件下充分利用GPU,能够有效地提高拟单层覆盖近似集的计算效率。 展开更多
关键词 拟单层覆盖近似集 集值信息系统 矩阵化 gpu加速 分批处理
下载PDF
TEB:GPU上矩阵分解重构的高效SpMV存储格式
12
作者 王宇华 张宇琪 +2 位作者 何俊飞 徐悦竹 崔环宇 《计算机科学与探索》 CSCD 北大核心 2024年第4期1094-1108,共15页
稀疏矩阵向量乘法(SpMV)是科学与工程领域中一个至关重要的计算过程,CSR(compressed sparse row)格式是最常用的稀疏矩阵存储格式之一,在图形处理器(GPU)平台上实现并行SpMV的过程中,其只存储稀疏矩阵的非零元,避免零元素填充所带来的... 稀疏矩阵向量乘法(SpMV)是科学与工程领域中一个至关重要的计算过程,CSR(compressed sparse row)格式是最常用的稀疏矩阵存储格式之一,在图形处理器(GPU)平台上实现并行SpMV的过程中,其只存储稀疏矩阵的非零元,避免零元素填充所带来的计算冗余,节约存储空间,但存在着负载不均衡的问题,浪费了计算资源。针对上述问题,对近年来效果良好的存储格式进行了研究,提出了一种逐行分解重组存储格式——TEB(threshold-exchangeorder block)格式。该格式采用启发式阈值选择算法确定合适分割阈值,并结合基于重排序的行归并算法,对稀疏矩阵进行重构分解,使得块与块之间非零元个数尽可能得相近,其次结合CUDA(computer unified device architecture)线程技术,提出了基于TEB存储格式的子块间并行SpMV算法,能够合理分配计算资源,解决负载不均衡问题,从而提高SpMV并行计算效率。为了验证TEB存储格式的有效性,在NVIDIA Tesla V100平台上进行实验,结果表明TEB相较于PBC(partition-block-CSR)、AMF-CSR(adaptive multi-row folding of CSR)、CSR-Scalar(compressed sparse row-scalar)和CSR5(compressed sparse row 5)存储格式,在SpMV的时间性能方面平均可提升3.23、5.83、2.33和2.21倍;在浮点计算性能方面,平均可提高3.36、5.95、2.29和2.13倍。 展开更多
关键词 稀疏矩阵向量乘法(SpMV) 重新排序 CSR格式 负载均衡 存储格式 图形处理器(gpu)
下载PDF
GNNSched:面向GPU的图神经网络推理任务调度框架 被引量:1
13
作者 孙庆骁 刘轶 +4 位作者 杨海龙 王一晴 贾婕 栾钟治 钱德沛 《计算机工程与科学》 CSCD 北大核心 2024年第1期1-11,共11页
由于频繁的显存访问,图神经网络GNN在GPU上运行时往往资源利用率较低。现有的推理框架由于没有考虑GNN输入的不规则性,直接适用到GNN进行推理任务共置时可能会超出显存容量导致任务失败。对于GNN推理任务,需要根据其输入特点预先分析并... 由于频繁的显存访问,图神经网络GNN在GPU上运行时往往资源利用率较低。现有的推理框架由于没有考虑GNN输入的不规则性,直接适用到GNN进行推理任务共置时可能会超出显存容量导致任务失败。对于GNN推理任务,需要根据其输入特点预先分析并发任务的显存占用情况,以确保并发任务在GPU上的成功共置。此外,多租户场景提交的推理任务亟需灵活的调度策略,以满足并发推理任务的服务质量要求。为了解决上述问题,提出了GNNSched,其在GPU上高效管理GNN推理任务的共置运行。具体来说,GNNSched将并发推理任务组织为队列,并在算子粒度上根据成本函数估算每个任务的显存占用情况。GNNSched实现了多种调度策略来生成任务组,这些任务组被迭代地提交到GPU并发执行。实验结果表明,GNNSched能够满足并发GNN推理任务的服务质量并降低推理任务的响应时延。 展开更多
关键词 图神经网络 图形处理器 推理框架 任务调度 估计模型
下载PDF
隐私计算环境下深度学习的GPU加速技术综述
14
作者 秦智翔 杨洪伟 +2 位作者 郝萌 何慧 张伟哲 《信息安全研究》 CSCD 北大核心 2024年第7期586-593,共8页
随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密... 随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密2种隐私计算技术,而后探讨了明文环境与隐私计算环境下深度学习的GPU加速技术.在明文环境下,介绍了数据并行和模型并行2种基本的深度学习并行训练模式,分析了重计算和显存交换2种不同的内存优化技术,并介绍了分布式神经网络训练过程中的梯度压缩技术.介绍了在隐私计算环境下安全多方计算和同态加密2种不同隐私计算场景下的深度学习GPU加速技术.简要分析了2种环境下GPU加速深度学习方法的异同. 展开更多
关键词 深度学习 gpu计算 隐私计算 安全多方计算 同态加密
下载PDF
面向GPU并行编程的线程同步综述
15
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPgpu) 并行编程 线程同步 性能优化
下载PDF
Volumetric lattice Boltzmann method for pore-scale mass diffusionadvection process in geopolymer porous structures 被引量:1
16
作者 Xiaoyu Zhang Zirui Mao +6 位作者 Floyd W.Hilty Yulan Li Agnes Grandjean Robert Montgomery Hans-Conrad zur Loye Huidan Yu Shenyang Hu 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第6期2126-2136,共11页
Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advecti... Porous materials present significant advantages for absorbing radioactive isotopes in nuclear waste streams.To improve absorption efficiency in nuclear waste treatment,a thorough understanding of the diffusion-advection process within porous structures is essential for material design.In this study,we present advancements in the volumetric lattice Boltzmann method(VLBM)for modeling and simulating pore-scale diffusion-advection of radioactive isotopes within geopolymer porous structures.These structures are created using the phase field method(PFM)to precisely control pore architectures.In our VLBM approach,we introduce a concentration field of an isotope seamlessly coupled with the velocity field and solve it by the time evolution of its particle population function.To address the computational intensity inherent in the coupled lattice Boltzmann equations for velocity and concentration fields,we implement graphics processing unit(GPU)parallelization.Validation of the developed model involves examining the flow and diffusion fields in porous structures.Remarkably,good agreement is observed for both the velocity field from VLBM and multiphysics object-oriented simulation environment(MOOSE),and the concentration field from VLBM and the finite difference method(FDM).Furthermore,we investigate the effects of background flow,species diffusivity,and porosity on the diffusion-advection behavior by varying the background flow velocity,diffusion coefficient,and pore volume fraction,respectively.Notably,all three parameters exert an influence on the diffusion-advection process.Increased background flow and diffusivity markedly accelerate the process due to increased advection intensity and enhanced diffusion capability,respectively.Conversely,increasing the porosity has a less significant effect,causing a slight slowdown of the diffusion-advection process due to the expanded pore volume.This comprehensive parametric study provides valuable insights into the kinetics of isotope uptake in porous structures,facilitating the development of porous materials for nuclear waste treatment applications. 展开更多
关键词 Volumetric lattice Boltzmann method(VLBM) Phase field method(PFM) Pore-scale diffusion-advection Nuclear waste treatment Porous media flow graphics processing unit(gpu) parallelization
下载PDF
基于GPU的LBM迁移模块算法优化
17
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm
18
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method parallel computation graphical processing unit(gpu)
下载PDF
基于GPU的实景三维模型裁剪算法研究
19
作者 马东岭 李铭通 朱悦凯 《山东建筑大学学报》 2024年第1期108-116,共9页
图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Un... 图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Unified Device Architecture,CUDA)技术在实景三维模型裁剪中的应用,提出了一种基于GPU的实景三维模型裁剪算法,包括设计了基于面拓扑的多级索引结构,以实现线程内重复交点快速查找;提出了一种轻量多边形三角化方法,优化算法流程;使用多种优化策略,在不影响裁剪网格质量的情况下进一步提高算法的性能。结果表明:根据模型大小与裁剪次数的不同,相较于传统算法,所提方法在单次裁剪的情况下加速比可达13.93,在多次裁剪的情况下加速比可达35.85,显著地提高了模型的裁剪效率。 展开更多
关键词 图形处理器 实景三维模型 三角网裁剪 并行计算
下载PDF
Complex hexagonal close-packed dendritic growth during alloy solidification by graphics processing unit-accelerated three-dimensional phase-field simulations:demo for Mg–Gd alloy
20
作者 Sheng-Lan Yang Jing Zhong +5 位作者 Kai Wang Xun Kang Jian-Bao Gao Jiong Wang Qian Li Li-Jun Zhang 《Rare Metals》 SCIE EI CAS CSCD 2023年第10期3468-3484,共17页
In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dim... In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dimensional(3D) phase-field simulations,as demonstrated for a Mg-Gd alloy.An anisotropic phasefield model with finite interface dissipation was developed by incorporating the contribution of the anisotropy of interfacial energy into the total free energy functional.The modified spherical harmonic anisotropy function was then chosen for the hcp crystal.The GPU parallel computing algorithm was implemented in the present phase-field model,and a corresponding code was developed in the compute unified device architecture parallel computing platform.Benchmark tests indicated that the calculation efficiency of a single TESLA V100 GPU could be~80times that of open multi-processing(OpenMP) with eight central processing unit cores.By coupling the phase-field model with reliable thermodynamic and interfacial energy descriptions,the 3D phase-field simulation of α-Mg dendritic growth in the Mg-6Gd(in wt%) alloy during solidification was performed.Various two-dimensional dendrite morphologies were revealed by cutting the simulated 3D dendrite along different crystallographic planes.Typical sixfold equiaxed and butterflied microstructures observed in experiments were well reproduced. 展开更多
关键词 Interfacial anisotropy Dendrite solidification Phase-field model graphics processing unit(gpu) Mg–Gd
原文传递
上一页 1 2 41 下一页 到第
使用帮助 返回顶部