期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Large-Eddy Simulation of Airflow over a Steep, Three-Dimensional Isolated Hill with Multi-GPUs Computing
1
作者 Takanori Uchida 《Open Journal of Fluid Dynamics》 2018年第4期416-434,共19页
The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence si... The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers. 展开更多
关键词 LES ISOLATED HILL Multi-Cores Multi-CPUs COMPUTING multi-gpus COMPUTING
下载PDF
基于Multi-GPU平台的大规模图数据处理 被引量:7
2
作者 张珩 张立波 武延军 《计算机研究与发展》 EI CSCD 北大核心 2018年第2期273-288,共16页
在GPU高性能节点上构建高效的大规模图数据的算法和系统已经日益成为研究热点,以GPU协处理器为计算核心不仅能够提供大规模线程的并行环境,也能提供高吞吐的内存和缓存访问机制.随着图的规模增大,相对大小局限的GPU的设备访存空间逐渐... 在GPU高性能节点上构建高效的大规模图数据的算法和系统已经日益成为研究热点,以GPU协处理器为计算核心不仅能够提供大规模线程的并行环境,也能提供高吞吐的内存和缓存访问机制.随着图的规模增大,相对大小局限的GPU的设备访存空间逐渐不能满足缓存整个图数据的应用需求,也催生了大量以单节点上外存I/O优化(out-of-core graph)为主要研究方向的大规模图数据处理系统.为了应对这一瓶颈,现有的算法和系统研究采用对图切分的压缩数据形式(即shards)用以数据传输和迭代计算.然而,这类研究扩展到Multi-GPU平台上往往性能的局限性表现在对PCI-E带宽的高依赖性,同时也由于Multi-GPU上任务负载不均衡而缺乏一定的可扩展性.为了应对上述挑战,提出并设计了基于Multi-GPU平台的支持高效、可扩展的大规模图数据处理系统GFlow.GFlow提出了全新的适用于Multi-GPU下的图数据Grid切分策略和双层滑动窗口算法,在将图的属性数据(点的状态集合、点/边权重值)缓存于各GPU设备之后,顺序加载图的拓扑结构数据(点/边集合)值各GPU中.通过双层滑动窗口,GFlow动态地加载数据分块从SSD存储至GPU设备内存,并顺序化聚合并应用处理过程中各GPU所生成的Updates.通过在9个现实图数据集上的实验结果可以看出,GFlow在Multi-GPU平台下相比其他支持外存图(out-of-core graph)处理的相关系统性能表现更为优异,对比CPU下的GraphChi和X-Stream分别提升25.6X和20.3X,对比GPU下支持外存图数据处理的GraphReduce系统单GPU提升1.3~2.5X.同时GFlow可扩展性在Multi-GPU上也表现良好. 展开更多
关键词 大规模图数据 multi-gpu 图分块 双层滑动窗口 数据传输
下载PDF
Multi-GPU加速的二元合金定向凝固三维相场模型 被引量:1
3
作者 朱昶胜 徐升 +1 位作者 冯力 李浩 《兰州理工大学学报》 CAS 北大核心 2018年第6期24-29,共6页
基于三维相场模型,使用MPI+CUDA异构协同并行技术,在GPU集群上建立三维合金定向凝固的MultiGPU计算模型,实现了Al-Cu二元合金三维定向凝固的模拟.再现了Al-Cu二元合金三维定向凝固的过程,以及不同取向晶粒间的竞争生长现象.通过与传统CP... 基于三维相场模型,使用MPI+CUDA异构协同并行技术,在GPU集群上建立三维合金定向凝固的MultiGPU计算模型,实现了Al-Cu二元合金三维定向凝固的模拟.再现了Al-Cu二元合金三维定向凝固的过程,以及不同取向晶粒间的竞争生长现象.通过与传统CPU串行计算模型相比较,验证了Multi-GPU计算模型的计算效率和加速效果.实现了二元合金定向凝固的加速模拟计算,其加速比最大可达57.7. 展开更多
关键词 multi-gpu MPI+CUDA 定向凝固 相场法
下载PDF
An effi cient scheme for multi-GPU TTI reverse time migration 被引量:1
4
作者 Liu Guo-Feng Meng Xiao-Hong +1 位作者 Yu Zhen-Jiang Liu Ding-Jin 《Applied Geophysics》 SCIE CSCD 2019年第1期56-63,共8页
Reverse time migration (RTM) is an indispensable but computationally intensive seismic exploration technique. Graphics processing units (GPUs) by NVIDIA■offer the option for parallel computations and speed improvemen... Reverse time migration (RTM) is an indispensable but computationally intensive seismic exploration technique. Graphics processing units (GPUs) by NVIDIA■offer the option for parallel computations and speed improvements in such high-density processes. With increasing seismic imaging space, the problems associated with multi-GPU techniques need to be addressed. We propose an efficient scheme for multi-GPU programming based on the features of the compute-unified device Architecture (CUDA) using GPU hardware, including concurrent kernel execution, CUDA streams, and peer-to-peer (P2P) communication between the different GPUs. In addition, by adjusting the computing time for imaging during RTM, the data communication times between GPUs become negligible. This means that the overall computation effi ciency improves linearly, as the number of GPUs increases. We introduce the multi-GPU scheme by using the acoustic wave propagation and then describe the implementation of RTM in tilted transversely isotropic (TTI) media. Next, we compare the multi-GPU and the unifi ed memory schemes. The results suggest that the proposed multi- GPU scheme is superior and, with increasing number of GPUs, the computational effi ciency improves linearly. 展开更多
关键词 multi-gpu KERNEL PEER-TO-PEER FORWARD MODELING TTI RTM
下载PDF
并行显卡:nVIDIA SLI Multi-GPU技术再现 被引量:1
5
作者 张岩 《个人电脑》 2004年第8期192-197,共6页
将SLI推向普及化,将会对整个显卡市场产生深远的影响。
关键词 并行显卡 NVIDIA SLI multi-gpu 显存频率
下载PDF
SLI回归?——NVIDIA SLI multi-GPU简介
6
作者 本苯 《大众硬件》 2004年第8期91-91,共1页
6月底NVIDIA宣布推出最新的SLI multi-GPU技术。SLI的全称是Scalable Link Interface,强调其可升级性,这是自Voodoo2 SLI被淘汰后,我们首次在桌面系统上看到双显卡同时工作以提升性能的SLI技术。
关键词 NVIDIA SLI multi-gpu 显卡 显存频率 MIO接口 超频性能 前端总线
下载PDF
基于GPU并行技术的超大型海面舰船电磁散射仿真 被引量:2
7
作者 郑文军 杨伟 周礼来 《电子科技大学学报》 EI CAS CSCD 北大核心 2023年第4期549-554,共6页
为了解决超电大尺寸海面舰船场景中电磁散射计算的瓶颈问题,研究基于多图像处理单元(Multi-GPU)并行加速技术的弹跳射线法(SBR)。借助统一设备计算架构(CUDA)提供的多线程服务(MPS),构建Multi-GPU并行加速框架,研究基于区域射线束划分GP... 为了解决超电大尺寸海面舰船场景中电磁散射计算的瓶颈问题,研究基于多图像处理单元(Multi-GPU)并行加速技术的弹跳射线法(SBR)。借助统一设备计算架构(CUDA)提供的多线程服务(MPS),构建Multi-GPU并行加速框架,研究基于区域射线束划分GPU计算任务和实现方式;研究基于矩阵网格的任务分割技术,最大限度提高GPU全局内存利用率;针对不同运算单元间的差异所带来的计算不同步问题,设计基于动态负载均衡算法的调度系统,进而提高计算资源利用率。仿真结果表明,在双GPU硬件平台上,该方案与现有并行技术算法相比,在确保结果准确性的情况下加速比接近甚至超过200%。因此,该技术方案能够有效解决超电大海面舰船电磁散射问题。 展开更多
关键词 超电大 multi-gpu 雷达散射截面 海面舰船 射线追踪法
下载PDF
面向多核CPU多GPU的节点内并行混合绘制模型 被引量:3
8
作者 刘华海 王攀 +3 位作者 蔡勋 曾亮 王文珂 李思昆 《系统仿真学报》 CAS CSCD 北大核心 2012年第1期94-98,112,共6页
分布式并行绘制集群节点可以配置多核CPU和多个GPU构建节点内多CPU多GPU系统。现有的节点内并行绘制模型既没有充分发挥多核CPU的强大计算能力,还将绘制、读回和合成阶段串行耦合在一起导致了大量的GPU闲置停顿,严重影响了节点内并行... 分布式并行绘制集群节点可以配置多核CPU和多个GPU构建节点内多CPU多GPU系统。现有的节点内并行绘制模型既没有充分发挥多核CPU的强大计算能力,还将绘制、读回和合成阶段串行耦合在一起导致了大量的GPU闲置停顿,严重影响了节点内并行绘制性能。提出了一种节点内高效的并行绘制模型,通过软件绘制与硬件绘制相结合的方法将硬件绘制与图像合成分离,同时利用DMA异步传输机制,构建了节点内绘制、读回和合成三段并行绘制流水线。与现有节点内并行绘制模型相比,并行混合绘制模型不但降低GPU资源闲置率,而且提高了CPU资源使用率。理论分析与实验表明相同应用采用并行混合绘制模型的性能可以达到现有模型的3-4倍,并且具有更好的数据扩展性、性能扩展性。 展开更多
关键词 multi-gpu MULTI-CPU 分布式并行绘制 异步合成 DMA
下载PDF
Advances of Pipeline Model Parallelism for Deep Learning Training:An Overview
9
作者 关磊 李东升 +3 位作者 梁吉业 王文剑 葛可适 卢锡城 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第3期567-584,共18页
Deep learning has become the cornerstone of artificial intelligence,playing an increasingly important role in human production and lifestyle.However,as the complexity of problem-solving increases,deep learning models ... Deep learning has become the cornerstone of artificial intelligence,playing an increasingly important role in human production and lifestyle.However,as the complexity of problem-solving increases,deep learning models become increasingly intricate,resulting in a proliferation of large language models with an astonishing number of parameters.Pipeline model parallelism(PMP)has emerged as one of the mainstream approaches to addressing the significant challenge of training“big models”.This paper presents a comprehensive review of PMP.It covers the basic concepts and main challenges of PMP.It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches,and discusses the main techniques to achieve load balance for both intra-node and inter-node training.Furthermore,the main techniques to optimize computation,storage,and communication are presented,with potential research directions being discussed. 展开更多
关键词 deep learning pipeline schedule load balance multi-gpu system pipeline model parallelism(PMP)
原文传递
Multiscale Hemodynamics Using GPU Clusters
10
作者 Mauro Bisson Massimo Bernaschi +2 位作者 Simone Melchionna Sauro Succi Efthimios Kaxiras 《Communications in Computational Physics》 SCIE 2012年第1期48-64,共17页
The parallel implementation of MUPHY,a concurrent multiscale code for large-scale hemodynamic simulations in anatomically realistic geometries,for multi-GPU platforms is presented.Performance tests show excellent resu... The parallel implementation of MUPHY,a concurrent multiscale code for large-scale hemodynamic simulations in anatomically realistic geometries,for multi-GPU platforms is presented.Performance tests show excellent results,with a nearly linear parallel speed-up on up to 32GPUs and a more than tenfold GPU/CPU acceleration,all across the range of GPUs.The basic MUPHY scheme combines a hydrokinetic(Lattice Boltzmann)representation of the blood plasma,with a Particle Dynamics treatment of suspended biological bodies,such as red blood cells.To the best of our knowledge,this represents the first effort in the direction of laying down general design principles for multiscale/physics parallel Particle Dynamics applications in non-ideal geometries.This configures the present multi-GPU version of MUPHY as one of the first examples of a high-performance parallel code for multiscale/physics biofluidic applications in realistically complex geometries. 展开更多
关键词 multi-gpu computing HEMODYNAMICS molecular dynamics irregular domain
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部