期刊文献+
共找到380篇文章
< 1 2 19 >
每页显示 20 50 100
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
1
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(GPU) GPU parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
下载PDF
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
2
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (GPU) computing unified device architecture (CUDA)
下载PDF
Parallel Image Processing: Taking Grayscale Conversion Using OpenMP as an Example
3
作者 Bayan AlHumaidan Shahad Alghofaily +2 位作者 Maitha Al Qhahtani Sara Oudah Naya Nagy 《Journal of Computer and Communications》 2024年第2期1-10,共10页
In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularl... In recent years, the widespread adoption of parallel computing, especially in multi-core processors and high-performance computing environments, ushered in a new era of efficiency and speed. This trend was particularly noteworthy in the field of image processing, which witnessed significant advancements. This parallel computing project explored the field of parallel image processing, with a focus on the grayscale conversion of colorful images. Our approach involved integrating OpenMP into our framework for parallelization to execute a critical image processing task: grayscale conversion. By using OpenMP, we strategically enhanced the overall performance of the conversion process by distributing the workload across multiple threads. The primary objectives of our project revolved around optimizing computation time and improving overall efficiency, particularly in the task of grayscale conversion of colorful images. Utilizing OpenMP for concurrent processing across multiple cores significantly reduced execution times through the effective distribution of tasks among these cores. The speedup values for various image sizes highlighted the efficacy of parallel processing, especially for large images. However, a detailed examination revealed a potential decline in parallelization efficiency with an increasing number of cores. This underscored the importance of a carefully optimized parallelization strategy, considering factors like load balancing and minimizing communication overhead. Despite challenges, the overall scalability and efficiency achieved with parallel image processing underscored OpenMP’s effectiveness in accelerating image manipulation tasks. 展开更多
关键词 Parallel computing Image processing OPENMP Parallel Programming High Performance computing GPU (graphic processing unit)
下载PDF
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
4
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(GPU) compute unified device architecture (CUDA) neural network species recognition
下载PDF
Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units
5
作者 Stuart D.C.Walsh Martin O.Saar 《Communications in Computational Physics》 SCIE 2013年第3期867-879,共13页
Lattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior.These methods are well suited to parallel implementation,particularly on the sin... Lattice-Boltzmann methods are versatile numerical modeling techniques capable of reproducing a wide variety of fluid-mechanical behavior.These methods are well suited to parallel implementation,particularly on the single-instruction multiple data(SIMD)parallel processing environments found in computer graphics processing units(GPUs).Although recent programming tools dramatically improve the ease with which GPUbased applications can be written,the programming environment still lacks the flexibility available to more traditional CPU programs.In particular,it may be difficult to develop modular and extensible programs that require variable on-device functionality with current GPU architectures.This paper describes a process of automatic code generation that overcomes these difficulties for lattice-Boltzmann simulations.It details the development of GPU-based modules for an extensible lattice-Boltzmann simulation package-LBHydra.The performance of the automatically generated code is compared to equivalent purpose written codes for both single-phase,multiphase,and multicomponent flows.The flexibility of the new method is demonstrated by simulating a rising,dissolving droplet moving through a porous medium with user generated lattice-Boltzmann models and subroutines. 展开更多
关键词 Lattice-Boltzmann methods graphics processing units computational fluid dynamics
原文传递
Parallelizing maximum likelihood classification on computer cluster and graphics processing unit for supervised image classification
6
作者 Xuan Shi Bowei Xue 《International Journal of Digital Earth》 SCIE EI 2017年第7期737-748,共12页
Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image proc... Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image processing solutions are required to handle large scale of data.This paper introduces how maximum likelihood classification approach is parallelized for implementation on a computer cluster and a graphics processing unit to achieve high performance when processing big imagery data.The solution is scalable and satisfies the need of change detection,object identification,and exploratory analysis on large-scale high-resolution imagery data in remote sensing applications. 展开更多
关键词 Maximum likelihood classification supervised classification parallel computing graphics processing unit
原文传递
Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit 被引量:2
7
作者 Ke-shi GE Hua-you SU +1 位作者 Dong-sheng LI Xi-cheng LU 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2017年第7期915-927,共13页
The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several time... The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform'. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of CPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation. 展开更多
关键词 Density peak graphics processing unit Parallel computing CLUSTERING
原文传递
隐私计算环境下深度学习的GPU加速技术综述
8
作者 秦智翔 杨洪伟 +2 位作者 郝萌 何慧 张伟哲 《信息安全研究》 CSCD 北大核心 2024年第7期586-593,共8页
随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密... 随着深度学习技术的不断发展,神经网络模型的训练时间越来越长,使用GPU计算对神经网络训练进行加速便成为一项关键技术.此外,数据隐私的重要性也推动了隐私计算技术的发展.首先介绍了深度学习、GPU计算的概念以及安全多方计算、同态加密2种隐私计算技术,而后探讨了明文环境与隐私计算环境下深度学习的GPU加速技术.在明文环境下,介绍了数据并行和模型并行2种基本的深度学习并行训练模式,分析了重计算和显存交换2种不同的内存优化技术,并介绍了分布式神经网络训练过程中的梯度压缩技术.介绍了在隐私计算环境下安全多方计算和同态加密2种不同隐私计算场景下的深度学习GPU加速技术.简要分析了2种环境下GPU加速深度学习方法的异同. 展开更多
关键词 深度学习 GPU计算 隐私计算 安全多方计算 同态加密
下载PDF
基于GPU和角正交投影视图的多视角投影全息图
9
作者 曹雪梅 张春晓 +4 位作者 管明祥 夏林中 郭丽丽 苗玉虎 曹士平 《深圳大学学报(理工版)》 CAS CSCD 北大核心 2024年第5期536-541,共6页
针对多视角投影全息图生成速度慢的问题,提出一种基于计算机图形处理单元(graphics processing unit,GPU)的多视角投影计算全息图合成方法.获取多个角正交投影视图,充分利用GPU强大的并行计算能力,同时计算多幅投影视图对全息图的作用,... 针对多视角投影全息图生成速度慢的问题,提出一种基于计算机图形处理单元(graphics processing unit,GPU)的多视角投影计算全息图合成方法.获取多个角正交投影视图,充分利用GPU强大的并行计算能力,同时计算多幅投影视图对全息图的作用,即在计算过程中同时将沿着投影方向移位后的一系列角正交投影视图乘以其相应的常数相位因子.其中,每个投影图像的投影角决定了其移位的距离和常数相位因子.将所有并行计算结果累加,可以得到一个包含物体三维信息的二维复矩阵,即菲涅尔全息图.相较于使用计算机中央处理器(central processing unit,CPU)进行计算,本方法显著提升了计算速度,将计算效率提高了30~40倍,为多视角投影全息图的高效生成提供一种可行途径. 展开更多
关键词 信息处理技术 计算全息 全息显示 图形处理单元 角正交投影视图 多视角投影全息
下载PDF
面向GPU并行编程的线程同步综述
10
作者 高岚 赵雨晨 +2 位作者 张伟功 王晶 钱德沛 《软件学报》 EI CSCD 北大核心 2024年第2期1028-1047,共20页
并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GP... 并行计算已成为主流趋势.在并行计算系统中,同步是关键设计之一,对硬件性能的充分利用至关重要.近年来,GPU(graphic processing unit,图形处理器)作为应用最为广加速器得到了快速发展,众多应用也对GPU线程同步提出更高要求.然而,现有GPU系统却难以高效地支持真实应用中复杂的线程同步.研究者虽然提出了很多支持GPU线程同步的方法并取得了较大进展,但GPU独特的体系结构及并行模式导致GPU线程同步的研究仍然面临很多挑战.根据不同的线程同步目的和粒度对GPU并行编程中的线程同步进行分类.在此基础上,围绕GPU线程同步的表达和执行,首先分析总结GPU线程同步存在的难以高效表达、错误频发、执行效率低的关键问题及挑战;而后依据不同的GPU线程同步粒度,从线程同步表达方法和性能优化方法两个方面入手,介绍近年来学术界和产业界对GPU线程竞争同步及合作同步的研究,对现有研究方法进行分析与总结.最后,指出GPU线程同步未来的研究趋势和发展前景,并给出可能的研究思路,从而为该领域的研究人员提供参考. 展开更多
关键词 通用图形处理器(GPGPU) 并行编程 线程同步 性能优化
下载PDF
基于图形处理器的水下目标传递函数多频点处理方法
11
作者 钱浩然 王斌 《舰船科学技术》 北大核心 2024年第14期153-157,共5页
为了提高水下目标宽带回波的计算速度,本文提出一种基于图形处理器GPU的散射传递函数多频点快速计算解决方案。相较于传统算法中逐个频率点计算的方式,CUDA快速算法充分利用各频点处目标强度的相对独立性,基于GPU的硬件特点,同时计算宽... 为了提高水下目标宽带回波的计算速度,本文提出一种基于图形处理器GPU的散射传递函数多频点快速计算解决方案。相较于传统算法中逐个频率点计算的方式,CUDA快速算法充分利用各频点处目标强度的相对独立性,基于GPU的硬件特点,同时计算宽带内的散射声场,从而显著提高了计算效率。本文以潜航器模型为算例,对不同网格数量下模型的目标散射传递函数计算速度进行对比分析。仿真结果表明,相较于传统的CPU串行计算,采用CUDA快速算法能够实现超过80的加速比,有效提高了计算速度。 展开更多
关键词 板块元方法 图像处理器 计算统一设备架构 并行计算
下载PDF
GPU加速下的三维快速分解后向投影SAS成像算法
12
作者 陶鸿博 张东升 黄勇 《系统工程与电子技术》 EI CSCD 北大核心 2024年第10期3247-3256,共10页
后向投影(back projection,BP)算法是一种精确的时域成像算法,但BP算法的计算复杂度高,难以实现实时性成像,特别是在考虑三维成像时,BP算法的计算复杂度会进一步增加。提出一种应用在合成孔径声纳(synthetic aperture sonar,SAS)上的三... 后向投影(back projection,BP)算法是一种精确的时域成像算法,但BP算法的计算复杂度高,难以实现实时性成像,特别是在考虑三维成像时,BP算法的计算复杂度会进一步增加。提出一种应用在合成孔径声纳(synthetic aperture sonar,SAS)上的三维快速分解BP(fast factorized BP,FFBP)成像算法,并利用图形处理器(graphics processing unit,GPU)加速三维FFBP算法。经过对点目标的测试,计算时间从原本的263 s降低到了2.3 s,解决了SAS中的三维成像实时性问题。同时,验证了所提算法在非理想航迹下的成像效果。结果表明,在添加幅度不超过0.1 m(一个波长以内)的正弦扰动时,所提算法对点目标仍有良好的聚焦效果。 展开更多
关键词 快速分解后向投影 并行计算 图形处理器 合成孔径声纳 三维成像
下载PDF
gEdge:基于容器技术的云边协同的异构计算框架
13
作者 汪沄 汤冬劼 +2 位作者 郭开诚 戚正伟 管海兵 《计算机学报》 EI CAS CSCD 北大核心 2024年第8期1883-1900,共18页
由于按需灵活配置、高可用性、高资源利用率等优点,云计算技术成为过去十年的主流计算范式.随着万物互联时代的到来,单独依赖云计算技术已经无法满足数以亿计的IoT设备及其数据流量的需求.边缘计算可以被看作是云计算的进化,它因5G网络... 由于按需灵活配置、高可用性、高资源利用率等优点,云计算技术成为过去十年的主流计算范式.随着万物互联时代的到来,单独依赖云计算技术已经无法满足数以亿计的IoT设备及其数据流量的需求.边缘计算可以被看作是云计算的进化,它因5G网络和物联网的崛起而诞生.随着云游戏、VR技术以及人工智能技术在日常生活中的广泛运用,对计算资源的需求也在日渐增长.然而,受体积与功耗限制,处于边缘的节点设备算力较弱.本文提出了gEdge:一种基于容器技术的云边协同的异构计算框架.该框架通过GPU虚拟化技术,将云端的物理GPU资源分为多块虚拟GPU资源,按需为边缘节点提供GPU算力资源,并且对用户容器无感知.实验表明,使用gEdge框架使边缘节点使用的容器镜像体积降低了48.8%,容器启动时间降低了35.5%,平均相对运行速度提高了213%. 展开更多
关键词 图形处理器 虚拟化技术 容器技术 边缘计算 云边协同
下载PDF
基于异构平台的图像中值滤波的OpenCL加速算法 被引量:1
14
作者 肖诗洋 王镭 +1 位作者 杜莹 肖汉 《河北大学学报(自然科学版)》 CAS 北大核心 2024年第1期92-103,共12页
图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,... 图像噪声降低了图像信噪比和质量,去噪是图像处理工作的重要环节之一.本文提出了一种基于开放式计算语言(OpenCL)架构的图像中值滤波快速降噪并行算法.介绍了OpenCL体系结构特点和中值滤波处理流程.根据图形处理器(GPU)的并发结构特点,对图像中值滤波功能模块进行了并行优化,降低了算法复杂度.通过充分激活NDRange索引空间中的工作组和工作项来提高数据访问效率,优化内核工作组配置参数,实现了中值滤波器的并行处理.实验结果表明,在图像质量保持不变的情况下,与基于CPU的串行算法、基于开放多处理(OpenMP)并行算法和基于统一计算设备架构(CUDA)并行算法性能相比,图像中值滤波并行算法在OpenCL架构下NVIDIA GPU计算平台上分别获得了29.74、17.29、1.15倍的加速比.验证了算法的有效性和平台的可移植性,基本满足应用的实时性处理要求. 展开更多
关键词 中值滤波 椒盐噪声 图形处理器 开放式计算语言 并行算法
下载PDF
Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现
15
作者 张磊 赵光岳 +1 位作者 肖超恩 王建新 《计算机工程》 CAS CSCD 北大核心 2024年第9期208-215,共8页
近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较... 近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较多的时间和消耗较多的资源。为此,提出一种基于图形处理器(GPU)的Falcon密钥树并行生成方案。该方案使用奇偶线程联合控制的单指令多线程(SIMT)并行模式和无中间变量的直接计算模式,达到了提升速度和减少资源占用的目的。基于Python的CUDA平台进行了实验,验证结果的正确性。实验结果表明,Falcon密钥树生成在RTX 3060 Laptop的延迟为6 ms,吞吐量为167次/s,在计算单个Falcon密钥树生成部件时相对于CPU实现了1.17倍的加速比,在同时并行1024个Falcon密钥树生成部件时,GPU相对于CPU的加速比达到了约56倍,在嵌入式Jetson Xavier NX平台上的吞吐量为32次/s。 展开更多
关键词 后量子密码 Falcon算法 图形处理器 CUDA平台 并行计算
下载PDF
基于GPU的LBM迁移模块算法优化
16
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
基于GPU的实景三维模型裁剪算法研究
17
作者 马东岭 李铭通 朱悦凯 《山东建筑大学学报》 2024年第1期108-116,共9页
图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Un... 图形处理器(Graphic Processing Unit,GPU)作为主流高性能计算的加速设备,已越来越多地应用于诸多领域的并行计算中,利用GPU的并行计算能力,可以极大地提高传统算法的计算效率。文章主要研究GPU多线程计算方法与统一计算架构(Compute Unified Device Architecture,CUDA)技术在实景三维模型裁剪中的应用,提出了一种基于GPU的实景三维模型裁剪算法,包括设计了基于面拓扑的多级索引结构,以实现线程内重复交点快速查找;提出了一种轻量多边形三角化方法,优化算法流程;使用多种优化策略,在不影响裁剪网格质量的情况下进一步提高算法的性能。结果表明:根据模型大小与裁剪次数的不同,相较于传统算法,所提方法在单次裁剪的情况下加速比可达13.93,在多次裁剪的情况下加速比可达35.85,显著地提高了模型的裁剪效率。 展开更多
关键词 图形处理器 实景三维模型 三角网裁剪 并行计算
下载PDF
人工智能芯片技术演进与发展策略研究
18
作者 宋艳飞 孙佳琪 王睿哲 《信息技术与标准化》 2024年第6期94-98,共5页
为加快抢占人工智能技术高地,围绕人工智能芯片产业需求,阐述了人工智能芯片技术特点、技术路线和发展趋势,并分析了人工智能芯片与先进制造、软件和整机应用等产业生态深度耦合态势。根据人工智能芯片技术与产业发展特点,提出了强化统... 为加快抢占人工智能技术高地,围绕人工智能芯片产业需求,阐述了人工智能芯片技术特点、技术路线和发展趋势,并分析了人工智能芯片与先进制造、软件和整机应用等产业生态深度耦合态势。根据人工智能芯片技术与产业发展特点,提出了强化统筹协调、释放市场优势、深化国际合作策略建议,以期推动我国人工智能芯片高质量发展。 展开更多
关键词 人工智能芯片 图形处理器 统一计算设备架构
下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm 被引量:1
19
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation graphical processing unit(GPU)
下载PDF
Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units 被引量:6
20
作者 XIONG QinGang LI Bo +5 位作者 XU Ji FANG XiaoJian WANG XiaoWei WANG LiMin HE XianFeng GE Wei 《Chinese Science Bulletin》 SCIE EI CAS 2012年第7期707-715,共9页
Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a s... Many-core processors, such as graphic processing units (GPUs), are promising platforms for intrinsic parallel algorithms such as the lattice Boltzmann method (LBM). Although tremendous speedup has been obtained on a single GPU compared with mainstream CPUs, the performance of the LBM for multiple GPUs has not been studied extensively and systematically. In this article, we carry out LBM simulation on a GPU cluster with many nodes, each having multiple Fermi GPUs. Asynchronous execution with CUDA stream functions, OpenMP and non-blocking MPI communication are incorporated to improve efficiency. The algorithm is tested for two-dimensional Couette flow and the results are in good agreement with the analytical solution. For both the oneand two-dimensional decomposition of space, the algorithm performs well as most of the communication time is hidden. Direct numerical simulation of a two-dimensional gas-solid suspension containing more than one million solid particles and one billion gas lattice cells demonstrates the potential of this algorithm in large-scale engineering applications. The algorithm can be directly extended to the three-dimensional decomposition of space and other modeling methods including explicit grid-based methods. 展开更多
关键词 格子BOLTZMANN方法 图形处理单元 并行算法 集群 COUETTE流 LBM模拟 OPENMP 直接数值模拟
原文传递
上一页 1 2 19 下一页 到第
使用帮助 返回顶部