期刊文献+
共找到303篇文章
< 1 2 16 >
每页显示 20 50 100
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
1
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 parallel computing Image Processing OPENMP parallel Programming High Performance computing GPU (Graphic Processing Unit)
下载PDF
Heterogeneous parallel computing accelerated iterative subpixel digital image correlation 被引量:9
2
作者 HUANG JianWen ZHANG LingQi +6 位作者 JIANG ZhenYu DONG ShouBin CHEN Wei LIU YiPing LIU ZeJia ZHOU LiCheng TANG LiQun 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2018年第1期74-85,共12页
Parallel computing techniques have been introduced into digital image correlation(DIC) in recent years and leads to a surge in computation speed. The graphics processing unit(GPU)-based parallel computing demonstrated... Parallel computing techniques have been introduced into digital image correlation(DIC) in recent years and leads to a surge in computation speed. The graphics processing unit(GPU)-based parallel computing demonstrated a surprising effect on accelerating the iterative subpixel DIC, compared with CPU-based parallel computing. In this paper, the performances of the two kinds of parallel computing techniques are compared for the previously proposed path-independent DIC method, in which the initial guess for the inverse compositional Gauss-Newton(IC-GN) algorithm at each point of interest(POI) is estimated through the fast Fourier transform-based cross-correlation(FFT-CC) algorithm. Based on the performance evaluation, a heterogeneous parallel computing(HPC) model is proposed with hybrid mode of parallelisms in order to combine the computing power of GPU and multicore CPU. A scheme of trial computation test is developed to optimize the configuration of the HPC model on a specific computer. The proposed HPC model shows excellent performance on a middle-end desktop computer for real-time subpixel DIC with high resolution of more than 10000 POIs per frame. 展开更多
关键词 digital image correlation(DIC) inverse compositional Gauss-Newton(IC-GN) algorithm heterogeneous parallel computing graphics processing unit(GPU) multicore CPU real-time DIC
原文传递
Parallel computing solutions for Markov chain spatial sequential simulation of categorical fields 被引量:1
3
作者 Weixing Zhang Weidong Li +1 位作者 Chuanrong Zhang Tian Zhao 《International Journal of Digital Earth》 SCIE EI 2019年第5期566-582,共17页
The Markov chain random field(MCRF)model is a spatial statistical approach for modeling categorical spatial variables in multiple dimensions.However,this approach tends to be computationally costly when dealing with l... The Markov chain random field(MCRF)model is a spatial statistical approach for modeling categorical spatial variables in multiple dimensions.However,this approach tends to be computationally costly when dealing with large data sets because of its sequential simulation processes.Therefore,improving its computational efficiency is necessary in order to run this model on larger sizes of spatial data.In this study,we suggested four parallel computing solutions by using both central processing unit(CPU)and graphics processing unit(GPU)for executing the sequential simulation algorithm of the MCRF model,and compared them with the nonparallel computing solution on computation time spent for a land cover post-classification.The four parallel computing solutions are:(1)multicore processor parallel computing(MP),(2)parallel computing by GPU-accelerated nearest neighbor searching(GNNS),(3)MP with GPU-accelerated nearest neighbor searching(MPGNNS),and(4)parallel computing by GPU-accelerated approximation and GPU-accelerated nearest neighbor searching(GA-GNNS).Experimental results indicated that all of the four parallel computing solutions are at least 1.8×faster than the nonparallel solution.Particularly,the GA-GNNS solution with 512 threads per block is around 83×faster than the nonparallel solution when conducting a land cover post-classification with a remotely sensed image of 1000×1000 pixels. 展开更多
关键词 Markov chain random field parallel computing nearest neighbor searching APPROXIMATION graphics processing unit
原文传递
GPU parallel computing: Programming language,debugging tools and data structures
4
作者 Kun ZHOU 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2012年第1期5-15,共11页
With many cores driven by high memory bandwidth, today's graphics processing unit (GPU) has involved into an absolute computing workhorse. More and more scientists, researchers and software developers are using GPU... With many cores driven by high memory bandwidth, today's graphics processing unit (GPU) has involved into an absolute computing workhorse. More and more scientists, researchers and software developers are using GPUs to accelerate their algorithms and ap- plications. Developing complex programs and software on the GPU, however, is still far from easy with ex- isting tools provided by hardware vendors. This article introduces our recent research efforts to make GPU soft- ware development much easier. Specifically, we designed BSGP, a high-level programming language for general- purpose computation on the GPU. A BSGP program looks much the same as a sequential C program, and is thus easy to read, write and maintain. Its performance on the GPU is guaranteed by a well-designed compiler that converts the program to native GPU code. We also developed an effective debugging system for BSGP pro- grams based on the GPU interrupt, a unique feature of BSGP that allows calling CPU functions from inside GPU code. Moreover, using BSGP, we developed GPU algorithms for constructing several widely-used spatial hierarchies for high-performance graphics applications. 展开更多
关键词 graphics processing unit /GPU) parallel computing programming languages debugging tools data structures
原文传递
电大山区地物环境中电波传播的电磁计算 被引量:1
5
作者 王楠 刘俊志 +2 位作者 陈贵齐 赵延安 张玉 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2024年第1期21-28,共8页
在无人驾驶与无人机等新兴行业中,信号覆盖范围的要求较高,不仅仅在城市,在人迹罕至的山地、沙漠、森林中也需要无线信号的覆盖才能真正完成远程操控,这些地区更多需要考虑的是地势变化对电磁传播所带来的影响。计算电磁学中的一致性几... 在无人驾驶与无人机等新兴行业中,信号覆盖范围的要求较高,不仅仅在城市,在人迹罕至的山地、沙漠、森林中也需要无线信号的覆盖才能真正完成远程操控,这些地区更多需要考虑的是地势变化对电磁传播所带来的影响。计算电磁学中的一致性几何绕射理论方法是分析电大环境电磁问题的有效方法,使用计算电磁学的方法研究电磁波在山区地物环境中的传播规律。给出了一种建立不规则地形模型的新方法,可以通过数字高程的网格数据生成电磁算法可用的三次多项式曲面,使用多个立方曲面对不规则地形进行拼接,使用平均均方根误差验证模型数据的准确性。基于所得的地形数据,完成了并行的几何光学算法,并对区域电磁场的分布进行了仿真计算。选取了实际山区地物环境进行了实地测量,测量结果与仿真结果对比趋势一致,验证了该方法在非规则地形中电磁波传播分析中的有效性。考虑环境电磁计算的规模,建立了相应的并行策略,100核测试的并行效率可以保持在80%以上。 展开更多
关键词 电大山区地物环境 电波传播 数字高程 分形建模 几何光学 并行计算
下载PDF
面向GPU并行编程的线程同步综述
6
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPGPU) 并行编程 线程同步 性能优化
下载PDF
基于图形处理器的水下目标传递函数多频点处理方法
7
作者 钱浩然 王斌 《舰船科学技术》 北大核心 2024年第14期153-157,共5页
为了提高水下目标宽带回波的计算速度,本文提出一种基于图形处理器GPU的散射传递函数多频点快速计算解决方案。相较于传统算法中逐个频率点计算的方式,CUDA快速算法充分利用各频点处目标强度的相对独立性,基于GPU的硬件特点,同时计算宽... 为了提高水下目标宽带回波的计算速度,本文提出一种基于图形处理器GPU的散射传递函数多频点快速计算解决方案。相较于传统算法中逐个频率点计算的方式,CUDA快速算法充分利用各频点处目标强度的相对独立性,基于GPU的硬件特点,同时计算宽带内的散射声场,从而显著提高了计算效率。本文以潜航器模型为算例,对不同网格数量下模型的目标散射传递函数计算速度进行对比分析。仿真结果表明,相较于传统的CPU串行计算,采用CUDA快速算法能够实现超过80的加速比,有效提高了计算速度。 展开更多
关键词 板块元方法 图像处理器 计算统一设备架构 并行计算
下载PDF
基于异构平台的图像中值滤波的OpenCL加速算法 被引量:1
8
作者 肖诗洋 王镭 +1 位作者 杜莹 肖汉 《河北大学学报(自然科学版)》 CAS 北大核心 2024年第1期92-103,共12页
图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,... 图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,对图像中值滤波功能模块进行了并行优化,降低了算法复杂度.通过充分激活NDRange索引空间中的工作组和工作项来提高数据访问效率,优化内核工作组配置参数,实现了中值滤波器的并行处理.实验结果表明,在图像质量保持不变的情况下,与基于CPU的串行算法、基于开放多处理(OpenMP)并行算法和基于统一计算设备架构(CUDA)并行算法性能相比,图像中值滤波并行算法在OpenCL架构下NVIDIA GPU计算平台上分别获得了29.74、17.29、1.15倍的加速比.验证了算法的有效性和平台的可移植性,基本满足应用的实时性处理要求. 展开更多
关键词 中值滤波 椒盐噪声 图形处理器 开放式计算语言 并行算法
下载PDF
新能源电力系统细粒度并行与多速率电磁暂态仿真
9
作者 王啟国 徐晋 +2 位作者 汪可友 周建其 樊涛 《电力系统自动化》 EI CSCD 北大核心 2024年第3期113-121,共9页
随着可再生能源的快速发展,电力系统设备类型越来越多,系统振荡特征越来越复杂,对电磁暂态仿真的精度和效率提出了更高要求。基于大规模集成电路设计中所使用的延迟插入法(LIM),提出了新能源电力系统的细粒度建模方法,并结合图形处理器(... 随着可再生能源的快速发展,电力系统设备类型越来越多,系统振荡特征越来越复杂,对电磁暂态仿真的精度和效率提出了更高要求。基于大规模集成电路设计中所使用的延迟插入法(LIM),提出了新能源电力系统的细粒度建模方法,并结合图形处理器(GPU)的资源优势,实现了算法的并行求解。所提方法将传统交流电网与电力电子设备进行解耦,并基于混合数值稳定性判据和局部截断误差的方法确定了各子系统的步长。然后,通过插值实现了新能源电力系统的多速率仿真。最后,基于GPU硬件平台,以含新能源接入的改进39节点系统为例验证了所提方法的精度,并以不同规模的新能源接入、不同仿真步长的组合验证了所提方法在仿真效率方面的优势。 展开更多
关键词 可再生能源 电力系统 电磁暂态仿真 并行计算 细粒度仿真 多速率仿真 延迟插入法 图形处理器
下载PDF
A GPU-Based Parallel Algorithm for 2D Large Deformation Contact Problems Using the Finite Particle Method 被引量:1
10
作者 Wei Wang Yanfeng Zheng +2 位作者 Jingzhe Tang Chao Yang Yaozhi Luo 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第11期595-626,共32页
Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation fr... Large deformation contact problems generally involve highly nonlinear behaviors,which are very time-consuming and may lead to convergence issues.The finite particle method(FPM)effectively separates pure deformation from total motion in large deformation problems.In addition,the decoupled procedures of the FPM make it suitable for parallel computing,which may provide an approach to solve time-consuming issues.In this study,a graphics processing unit(GPU)-based parallel algorithm is proposed for two-dimensional large deformation contact problems.The fundamentals of the FPM for planar solids are first briefly introduced,including the equations of motion of particles and the internal forces of quadrilateral elements.Subsequently,a linked-list data structure suitable for parallel processing is built,and parallel global and local search algorithms are presented for contact detection.The contact forces are then derived and directly exerted on particles.The proposed method is implemented with main solution procedures executed in parallel on a GPU.Two verification problems comprising large deformation frictional contacts are presented,and the accuracy of the proposed algorithm is validated.Furthermore,the algorithm’s performance is investigated via a large-scale contact problem,and the maximum speedups of total computational time and contact calculation reach 28.5 and 77.4,respectively,relative to commercial finite element software Abaqus/Explicit running on a single-core central processing unit(CPU).The contact calculation time percentage of the total calculation time is only 18%with the FPM,much smaller than that(50%)with Abaqus/Explicit,demonstrating the efficiency of the proposed method. 展开更多
关键词 Finite particle method graphics processing unit(GPU) parallel computing contact algorithm LARGE
下载PDF
基于GPU的LBM迁移模块算法优化
11
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
12
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(GPU) GPU parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
基于GPU的实景三维模型裁剪算法研究
13
作者 马东岭 李铭通 朱悦凯 《山东建筑大学学报》 2024年第1期108-116,共9页
图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Un... 图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Unified Device Architecture,CUDA)技术在实景三维模型裁剪中的应用,提出了一种基于GPU的实景三维模型裁剪算法,包括设计了基于面拓扑的多级索引结构,以实现线程内重复交点快速查找;提出了一种轻量多边形三角化方法,优化算法流程;使用多种优化策略,在不影响裁剪网格质量的情况下进一步提高算法的性能。结果表明:根据模型大小与裁剪次数的不同,相较于传统算法,所提方法在单次裁剪的情况下加速比可达13.93,在多次裁剪的情况下加速比可达35.85,显著地提高了模型的裁剪效率。 展开更多
关键词 图形处理器 实景三维模型 三角网裁剪 并行计算
下载PDF
FRACTAL图形生成系统
14
作者 许道云 周沛 聂堂钊 《贵州大学学报(自然科学版)》 1994年第3期159-166,共8页
本文就自平方复函数z→z^2+c生成的Fractal在计算机上的处理进行了广泛讨论.形成了一个完整的Fractal图形生成系统.系统由Microsoft c 6.0编制而成,可对形如Z_(n+1)=f(z_n)的复变换进行作图分析.
关键词 PRACTAL 计算机图形学
下载PDF
Optimizing photoacoustic image reconstruction using cross-platform parallel computation
15
作者 Tri Vu Yuehang Wang Jun Xia 《Visual Computing for Industry,Biomedicine,and Art》 2018年第1期12-17,共6页
Three-dimensional(3D)image reconstruction involves the computations of an extensive amount of data that leads to tremendous processing time.Therefore,optimization is crucially needed to improve the performance and eff... Three-dimensional(3D)image reconstruction involves the computations of an extensive amount of data that leads to tremendous processing time.Therefore,optimization is crucially needed to improve the performance and efficiency.With the widespread use of graphics processing units(GPU),parallel computing is transforming this arduous reconstruction process for numerous imaging modalities,and photoacoustic computed tomography(PACT)is not an exception.Existing works have investigated GPU-based optimization on photoacoustic microscopy(PAM)and PACT reconstruction using compute unified device architecture(CUDA)on either C++or MATLAB only.However,our study is the first that uses cross-platform GPU computation.It maintains the simplicity of MATLAB,while improves the speed through CUDA/C++−based MATLAB converted functions called MEXCUDA.Compared to a purely MATLAB with GPU approach,our cross-platform method improves the speed five times.Because MATLAB is widely used in PAM and PACT,this study will open up new avenues for photoacoustic image reconstruction and relevant real-time imaging applications. 展开更多
关键词 Photoacoustic computed tomography graphics processing units parallel computation Focal-line backprojection algorithm MATLAB Optical imaging
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm
16
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method parallel computation Graphical processing unit(GPU)
下载PDF
基于GPU加速的全源对最短路径并行算法
17
作者 肖汉 肖诗洋 +1 位作者 李焕勤 周清雷 《云南大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第5期1022-1032,共11页
针对最短路径算法处理大规模数据集低效的问题,提出了基于图形处理器(Graphics Processing Unit,GPU)加速的全源对最短路径并行算法.首先通过优化矩阵乘法算法实现了在工作组内和组间进行并行运算数据,然后减少了非规则行造成的工作项分... 针对最短路径算法处理大规模数据集低效的问题,提出了基于图形处理器(Graphics Processing Unit,GPU)加速的全源对最短路径并行算法.首先通过优化矩阵乘法算法实现了在工作组内和组间进行并行运算数据,然后减少了非规则行造成的工作项分支,最后降低了工作项对邻接矩阵计算条带存储资源的访问延时.实验结果表明,与基于AMD Ryzen5 1600X CPU的串行算法、基于开放多处理(Open Multi-Processing, OpenMP)并行算法和基于统一计算设备架构(Compute Unified Device Architecture, CUDA)并行算法相比,最短路径并行算法在开放式计算语言(Open Computing Language, OpenCL)架构下NVIDIA GeForce GTX 1 070计算平台上分别获得了196.35、36.76和2.25倍的加速比,验证了提出的并行优化方法的有效性和性能可移植性. 展开更多
关键词 最短路径 重复平方法 图形处理器 开放式计算语言 并行算法
下载PDF
基于异构计算平台的背景噪声预处理并行算法
18
作者 吴超 卫谦 +2 位作者 周俊伟 李会民 孙广中 《计算机工程与科学》 CSCD 北大核心 2023年第10期1711-1719,共9页
背景噪声地震学利用地震台站记录的背景噪声信号计算台站之间的互相关信息,以此推演地质结构信息,近年来广泛应用于地球结构和油气勘探等领域。地震噪声数据处理往往需要通过预处理计算来减少仪器、地震信号的干扰,这一过程需进行多种... 背景噪声地震学利用地震台站记录的背景噪声信号计算台站之间的互相关信息,以此推演地质结构信息,近年来广泛应用于地球结构和油气勘探等领域。地震噪声数据处理往往需要通过预处理计算来减少仪器、地震信号的干扰,这一过程需进行多种信号处理计算。随着我国地震台站布设的推广,地震波形文件持续积累,预处理计算的耗时大大增加。针对计算耗时问题,基于图形处理器异构计算平台,提出了一种并行地震噪声预处理算法。并行算法在台站、时间和分段3个维度设计了并行计算框架,针对预处理中的计算过程实现了计算核函数,并且通过分批计算实现了对大批量文件的自适应处理。实验结果表明,并行预处理算法取得了约95倍的加速比,且具备良好的并行性。 展开更多
关键词 背景噪声地震学 数据预处理 并行计算 异构计算 图形处理器
下载PDF
基于GPU的子图匹配优化技术
19
作者 李安腾 崔鹏杰 +1 位作者 袁野 王国仁 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2023年第9期1856-1864,共9页
提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案.提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力.采用基于候选点的连接策略,在... 提出高效的基于图形处理器(GPU)的子图匹配算法GpSI,针对主流算法的过滤阶段和连接阶段分别设计优化方案.提出基于复合签名的过滤算法,在过滤阶段利用结点所处局部的数量特征和结构特征提升候选集过滤能力.采用基于候选点的连接策略,在连接阶段以最小邻居数为粒度预分配空间,设计高效的集合运算,避免传统方法重复连接的额外开销.多个数据集测试结果表明GpSI较主流GPU子图匹配算法在候选集过滤能力、执行用时、GPU内存占用和稳定性上均有明显优势.在真实数据集测试中,相比GPU友好子图匹配算法,GpSI的执行用时加速2~10倍. 展开更多
关键词 子图同构 数据挖掘 图形处理器(GPU) 并行计算 高性能计算
下载PDF
基于GPU的天线组阵信号时延补偿方法
20
作者 毛飞龙 焦义文 +4 位作者 马宏 韩久江 高泽夫 李超 李冬 《系统工程与电子技术》 EI CSCD 北大核心 2023年第8期2383-2394,共12页
针对天线组阵合成系统对于宽带、高速、并行信号的实时合成需求,设计了基于图形处理器(graphic processing unit,GPU)的天线组阵信号时延补偿方法。首先,分析了典型的整数时延补偿方法在GPU平台上实现的可行性,设计了基于数据块重叠保... 针对天线组阵合成系统对于宽带、高速、并行信号的实时合成需求,设计了基于图形处理器(graphic processing unit,GPU)的天线组阵信号时延补偿方法。首先,分析了典型的整数时延补偿方法在GPU平台上实现的可行性,设计了基于数据块重叠保留的整数时延补偿方法。然后,对比了典型的小数时延补偿方法的优劣,设计了适合于GPU并行加速的频域小数时延补偿方法。最后,对基于GPU的天线组阵信号时延补偿方法进行了实验验证。多次实验测试结果表明,在确保时延补偿正确性的基础上,基于GPU的时延补偿方法相比传统串行CPU时延补偿方法加速比提升了约18倍,采用基于GPU的时延补偿方法可实现对多天线信号的实时合成。 展开更多
关键词 时延补偿 天线组阵 图形处理器 并行计算
下载PDF
上一页 1 2 16 下一页 到第
使用帮助 返回顶部