期刊文献+
共找到914篇文章
< 1 2 46 >
每页显示 20 50 100
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
1
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/gpu parallel computing hybrid OpenMPCUDA
下载PDF
An incompressible flow solver on a GPU/CPU heterogeneous architecture parallel computing platform
2
作者 Qianqian Li Rong Li Zixuan Yang 《Theoretical & Applied Mechanics Letters》 CSCD 2023年第5期387-393,共7页
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t... A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature. 展开更多
关键词 gpu Acceleration Parallel computing Poisson equation PRECONDITIONER
下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
3
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework gpu Parallel computing Particle Swarm Optimization
下载PDF
混合现实中基于GPU虚拟化的AI计算优化
4
作者 梁桂才 李玉荣 《通信与信息技术》 2024年第2期114-120,共7页
研究探讨混合现实(MR)应用中,通过GPU虚拟化优化AI计算,聚焦于多任务调度与资源共享。研究提出了一个模型,其包含一种根据任务优先级、资源需求和等待时间,动态为正在执行的任务分配GPU资源的机制。同时,模型采用优化的多任务调度算法,... 研究探讨混合现实(MR)应用中,通过GPU虚拟化优化AI计算,聚焦于多任务调度与资源共享。研究提出了一个模型,其包含一种根据任务优先级、资源需求和等待时间,动态为正在执行的任务分配GPU资源的机制。同时,模型采用优化的多任务调度算法,以提高调度效率。实验结果表明,尽管在单任务性能测试中模型的执行时间、GPU利用率和内存使用方面略逊于物理GPU,但在多任务并发和资源共享方面,研究提出的模型展现了显著优势。未来研究将探索设计更公平高效的资源共享策略,以及进一步优化多任务调度算法。 展开更多
关键词 混合现实 AI计算 多任务调度 资源共享 gpu虚拟化
下载PDF
面向GPU架构的CCFD-KSSolver组件设计和实现
5
作者 张浩源 马文鹏 +2 位作者 袁武 张鉴 陆忠华 《数据与计算发展前沿》 CSCD 2024年第1期68-78,共11页
【应用背景】在如计算流体力学和材料科学等高性能应用领域中,大型稀疏线性方程的求解直接影响高性能应用的效率与精度。异构众核已成为现代超算系统体系结构的重要特征和发展趋势。【方法】本文面向CPU+GPU异构超算系统设计并实现了线... 【应用背景】在如计算流体力学和材料科学等高性能应用领域中,大型稀疏线性方程的求解直接影响高性能应用的效率与精度。异构众核已成为现代超算系统体系结构的重要特征和发展趋势。【方法】本文面向CPU+GPU异构超算系统设计并实现了线性解法器组件CCFD-KSSolver。该组件针对异构体系结构特征,实现了针对多物理场块结构矩阵的Krylov子空间解法器和多种典型预处理方法,采用了如计算通信重叠、GPU访存优化、CPUGPU协同计算等优化技术提升CCFD-KSSolver的计算效率。【结果】顶盖驱动流的实验表明,当子区域数目为8时,Block-ISAI相比于CPU和cuSPARSE的子区域求解器分别取得20.09倍和3.34倍的加速比,且具有更好的扩展性;对于百万阶规模的矩阵,应用3种子区域求解器的KSSolver在8个GPU上的并行效率分别为83.8%、55.7%、87.4%。【结论】本文选择具有块结构的经典多物理中的应用对解法器及预处理软构件进行测试,证明其稳定高效性,有力支撑了以流体力学数值模拟为代表的高性能计算与应用在异构系统上的开展。 展开更多
关键词 gpu KSSolver 并行优化 预条件 高性能计算
下载PDF
基于GPU加速的分布式水文模型并行计算性能
6
作者 庞超 周祖昊 +4 位作者 刘佳嘉 石天宇 杜崇 王坤 于新哲 《南水北调与水利科技(中英文)》 CAS CSCD 北大核心 2024年第1期33-38,共6页
针对具有物理机制的分布式水文模型对大流域、长序列模拟计算时间长、模拟速度慢的问题,引入基于GPU的并行计算技术,实现分布式水文模型WEP-L(water and energy transfer processes in large river basins)产流过程的并行化。选择鄱阳... 针对具有物理机制的分布式水文模型对大流域、长序列模拟计算时间长、模拟速度慢的问题,引入基于GPU的并行计算技术,实现分布式水文模型WEP-L(water and energy transfer processes in large river basins)产流过程的并行化。选择鄱阳湖流域为实验区,采用计算能力为8.6的NVIDIA RTX A4000对算法性能进行测试。研究表明:提出的基于GPU的分布式水文模型并行算法具有良好的加速效果,当线程总数越接近划分的子流域个数(计算任务量)时,并行性能越好,在实验流域WEP-L模型子流域单元为8712个时,加速比最大达到2.5左右;随着计算任务量的增加,加速比逐渐增大,当实验流域WEP-L模型子流域单元增加到24897个时,加速比能达到3.5,表明GPU并行算法在大尺度流域分布式水文模型计算中具有良好的发展潜力。 展开更多
关键词 基于gpu的并行算法 物理机制 分布式水文模型 WEP-L模型 计算性能
下载PDF
Study of a GPU-based parallel computing method for the Monte Carlo program 被引量:2
7
作者 罗志飞 邱睿 +3 位作者 李明 武祯 曾志 李君利 《Nuclear Science and Techniques》 SCIE CAS CSCD 2014年第A01期27-30,共4页
关键词 并行计算方法 蒙特卡罗程序 gpu GEANT4 模拟程序 蒙特卡洛方法 并行处理能力 图形处理单元
下载PDF
Regularized focusing inversion for large-scale gravity data based on GPU parallel computing
8
作者 WANG Haoran DING Yidan +1 位作者 LI Feida LI Jing 《Global Geology》 2019年第3期179-187,共9页
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes... Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape. 展开更多
关键词 LARGE-SCALE gravity data gpu parallel computing CUDA equivalent geometric TRELLIS FOCUSING INVERSION
下载PDF
基于GPU的视频SAR加速处理架构
9
作者 朱爽 刘彦斌 刘亚波 《电子技术应用》 2024年第6期18-22,共5页
视频合成孔径雷达(Video Synthetic Aperture Radar,ViSAR)可对目标区域实现动态监测,具备低时延、高分辨率的特点。高性能计算是ViSAR系统的关键技术。为满足ViSAR系统的实时性要求,提出了一种基于图形处理器(Graphics Processing Unit... 视频合成孔径雷达(Video Synthetic Aperture Radar,ViSAR)可对目标区域实现动态监测,具备低时延、高分辨率的特点。高性能计算是ViSAR系统的关键技术。为满足ViSAR系统的实时性要求,提出了一种基于图形处理器(Graphics Processing Unit,GPU)的ViSAR成像方案。该方案将极坐标格式算法(Polar Format Algorithm,PFA)部署到GPU上,使用并发流技术、异步并行技术等方法进行优化,并结合SAR成像的帧率与数据重叠率之间的关系,充分发挥GPU的计算性能。结果表明:本架构处理单张尺寸大小为2048×2048的数据耗时0.17 s,较中央处理器(Central Processing Unit,CPU)平均加速32.8倍,有效解决了ViSAR系统的实时成像问题。 展开更多
关键词 视频SAR 高性能计算 图形处理器 实时成像
下载PDF
基于GPU加速的等几何拓扑优化高效多重网格求解方法
10
作者 杨峰 罗世杰 +1 位作者 杨江鸿 王英俊 《中国机械工程》 EI CAS CSCD 北大核心 2024年第4期602-613,共12页
针对大规模等几何拓扑优化(ITO)计算量巨大、传统求解方法效率低的问题,提出了一种基于样条h细化的高效多重网格方程求解方法。该方法利用h细化插值得到粗细网格之间的权重信息,然后构造多重网格方法的插值矩阵,获得更准确的粗细网格映... 针对大规模等几何拓扑优化(ITO)计算量巨大、传统求解方法效率低的问题,提出了一种基于样条h细化的高效多重网格方程求解方法。该方法利用h细化插值得到粗细网格之间的权重信息,然后构造多重网格方法的插值矩阵,获得更准确的粗细网格映射信息,从而提高求解速度。此外,对多重网格求解过程进行分析,构建其高效GPU并行算法。数值算例表明,所提出的求解方法与线性插值的多重网格共轭梯度法、代数多重网格共轭梯度法和预处理共轭梯度法相比分别取得了最高1.47、11.12和17.02的加速比。GPU并行求解相对于CPU串行求解的加速比高达33.86,显著提高了大规模线性方程组的求解效率。 展开更多
关键词 等几何拓扑优化 方程组求解 h细化 多重网格法 gpu并行计算
下载PDF
基于GPU的LBM迁移模块算法优化
11
作者 黄斌 柳安军 +3 位作者 潘景山 田敏 张煜 朱光慧 《计算机工程》 CAS CSCD 北大核心 2024年第2期232-238,共7页
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但... 格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×10^(8)规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。 展开更多
关键词 高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排
下载PDF
Time Predictable Modeling Method for GPU Architecture with SIMT and Cache Miss Awareness
12
作者 Shaojie Zhang 《Journal of Electronic Research and Application》 2024年第2期109-115,共7页
Graphics Processing Units(GPUs)are used to accelerate computing-intensive tasks,such as neural networks,data analysis,high-performance computing,etc.In the past decade or so,researchers have done a lot of work on GPU ... Graphics Processing Units(GPUs)are used to accelerate computing-intensive tasks,such as neural networks,data analysis,high-performance computing,etc.In the past decade or so,researchers have done a lot of work on GPU architecture and proposed a variety of theories and methods to study the microarchitectural characteristics of various GPUs.In this study,the GPU serves as a co-processor and works together with the CPU in an embedded real-time system to handle computationally intensive tasks.It models the architecture of the GPU and further considers it based on some excellent work.The SIMT mechanism and Cache-miss situation provide a more detailed analysis of the GPU architecture.In order to verify the GPU architecture model proposed in this article,10 GPU kernel_task and an Nvidia GPU device were used to perform experiments.The experimental results showed that the minimum error between the kernel task execution time predicted by the GPU architecture model proposed in this article and the actual measured kernel task execution time was 3.80%,and the maximum error was 8.30%. 展开更多
关键词 Heterogeneous computing gpu Architecture modeling Time predictability
下载PDF
基于Seed-PCG法的列车-轨道-地基土三维随机振动GPU并行计算方法
13
作者 朱志辉 冯杨 +2 位作者 杨啸 李昊 邹有 《Journal of Central South University》 SCIE EI CAS CSCD 2024年第1期302-316,共15页
为了解决列车-轨道-地基土三维有限元模型随机多样本计算效率低的问题,本文提出了一种基于Seed-PCG法的高效并行计算方法。基于有限元法和虚拟激励法建立轨道不平顺激励下的三维列车-轨道-地基土耦合随机振动分析模型;针对车致地基土随... 为了解决列车-轨道-地基土三维有限元模型随机多样本计算效率低的问题,本文提出了一种基于Seed-PCG法的高效并行计算方法。基于有限元法和虚拟激励法建立轨道不平顺激励下的三维列车-轨道-地基土耦合随机振动分析模型;针对车致地基土随机振动分析产生的多右端项线性方程组求解问题,采用Seed-PCG方法进行求解。通过PCG方法求解种子系统得到的Krylov子空间进行投影,以改进其余线性方程组的初始解和对应的初始残量,有效提高了PCG法的收敛速度,最后,在MATLABCUDA混合平台上开发了并行计算程序。数值算例表明:相同计算平台下的该方法相比多点同步算法获得了104.2倍的加速;相比PCG法逐个求解方案减少了18%的迭代次数,获得了1.21倍的加速。 展开更多
关键词 Seed-PCG法 多右端项线性方程组 随机振动 gpu并行计算 列车-轨道-地基土耦合模型
下载PDF
Large-Eddy Simulation of Airflow over a Steep, Three-Dimensional Isolated Hill with Multi-GPUs Computing
14
作者 Takanori Uchida 《Open Journal of Fluid Dynamics》 2018年第4期416-434,共19页
The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence si... The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers. 展开更多
关键词 LES ISOLATED HILL Multi-Cores Multi-CPUs computing Multi-gpus computing
下载PDF
增材制造中GPU并行扫描线填充算法
15
作者 李慧贤 马创新 马良 《热加工工艺》 北大核心 2023年第13期100-104,113,共6页
增材制造模型朝着大型化和精细化的趋势发展,对模型数据处理效率的要求越来越高。路径填充作为模型数据处理的一项重要环节,其效率直接影响模型的整体处理效率。本文重点研究基于GPU的并行化扫描线填充算法,采用轮廓预处理算法实现GPU... 增材制造模型朝着大型化和精细化的趋势发展,对模型数据处理效率的要求越来越高。路径填充作为模型数据处理的一项重要环节,其效率直接影响模型的整体处理效率。本文重点研究基于GPU的并行化扫描线填充算法,采用轮廓预处理算法实现GPU并行求交运算的负载平衡,提出基于哈希值的三维坐标快速排序算法,构造轮廓组序号+坐标哈希值压缩结构,实现扫描线GPU并行化填充求交计算。经实验验证,本算法可大大减少扫描线填充算法耗时,对大型或精细模型的处理效果更为明显。 展开更多
关键词 增材制造 切片 并行计算 gpu
下载PDF
基于CPU-GPU混合编程的显微镜图像实时拼接
16
作者 吴为民 刘新 +2 位作者 李伙钦 江先伟 杨华 《重庆科技学院学报(自然科学版)》 CAS 2023年第3期67-74,共8页
随着电子显微镜图像的分辨率越来越高,图像拼接的计算量也越来越大,实时拼接的流畅效果对计算速度提出了很高的要求。利用NVIDIA的GPU并行编程框架CUDA,将拼接过程中耗时较长的图像特征点检测和图像拷贝部分迁移到GPU上进行并行计算,CP... 随着电子显微镜图像的分辨率越来越高,图像拼接的计算量也越来越大,实时拼接的流畅效果对计算速度提出了很高的要求。利用NVIDIA的GPU并行编程框架CUDA,将拼接过程中耗时较长的图像特征点检测和图像拷贝部分迁移到GPU上进行并行计算,CPU则负责逻辑控制部分的计算,提高了整体的运算效率。实验结果表明,CPU-GPU混合编程模式有效地缩短了显微镜图像拼接时间,提高了拼接的流畅度和实时性。 展开更多
关键词 电子显微镜 实时拼接 并行计算 CPU-gpu混合编程
下载PDF
基于GPU和数据同化的深水湖库水温与溶解氧中短期预报
17
作者 孙博闻 宗庆志 +2 位作者 杨晰淯 张袁宁 高学平 《中国水利水电科学研究院学报(中英文)》 北大核心 2023年第1期95-103,共9页
深水水库通常存在季节性温度分层,由温度分层引起溶解氧等水质指标的分层还会诱发库区水环境水生态问题。当前在中短期时间尺度上对水库水温和溶解氧进行预报的研究相对较少,提高数学模型的模拟效率与精度对提升中短期预报效果至关重要... 深水水库通常存在季节性温度分层,由温度分层引起溶解氧等水质指标的分层还会诱发库区水环境水生态问题。当前在中短期时间尺度上对水库水温和溶解氧进行预报的研究相对较少,提高数学模型的模拟效率与精度对提升中短期预报效果至关重要。本文采用集合卡尔曼滤波算法作为同化方法,基于CE-QUAL-W2模型建立水库水动力水质数学模型,基于OpenACC的GPU并行方法提升模型计算效率,构建大黑汀水库水温与溶解氧的数据同化系统,在中短期时间尺度上开展水库水温与溶解氧高精度、高效率预报。预报结果符合水库水温与溶解氧的中短期变化规律,能够为大黑汀水库的供水与生态安全提供技术支撑。 展开更多
关键词 深水湖库 gpu并行计算 数据同化 中短期预报 溶解氧
下载PDF
Managing Computing Infrastructure for IoT Data 被引量:1
18
作者 Sapna Tyagi Ashraf Darwish Mohammad Yahiya Khan 《Advances in Internet of Things》 2014年第3期29-35,共7页
Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, de... Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, deriving values and insights from big data using rich analytics becomes important for achieving competitiveness, success and leadership in every field. The Internet of Things (IoT) is causing the number and types of products to emit data at an unprecedented rate. Heterogeneity, scale, timeliness, complexity, and privacy problems with large data impede progress at all phases of the pipeline that can create value from data issues. With the push of such massive data, we are entering a new era of computing driven by novel and ground breaking research innovation on elastic parallelism, partitioning and scalability. Designing a scalable system for analysing, processing and mining huge real world datasets has become one of the challenging problems facing both systems researchers and data management researchers. In this paper, we will give an overview of computing infrastructure for IoT data processing, focusing on architectural and major challenges of massive data. We will briefly discuss about emerging computing infrastructure and technologies that are promising for improving massive data management. 展开更多
关键词 BIG DATA Cloud computing DATA ANALYTICS Elastic SCALABILITY Heterogeneous computing gpu PCM Massive DATA Processing
下载PDF
基于GPU并行加速变压器二维瞬态流场问题的研究及应用 被引量:1
19
作者 任增强 刘刚 +1 位作者 靳立鹏 武卫革 《华北电力大学学报(自然科学版)》 CAS 北大核心 2023年第6期66-75,共10页
针对采用无量纲最小二乘有限元法计算变压器二维瞬态流体场问题时计算时间长,效率低的问题,拟采用GPU对瞬态流体场程序进行并行加速。将瞬态流体场计算程序中计算量最大的两部分,即单元刚度阵的形成和稀疏线性方程组的求解,移植到GPU上... 针对采用无量纲最小二乘有限元法计算变压器二维瞬态流体场问题时计算时间长,效率低的问题,拟采用GPU对瞬态流体场程序进行并行加速。将瞬态流体场计算程序中计算量最大的两部分,即单元刚度阵的形成和稀疏线性方程组的求解,移植到GPU上运算,从而大幅减少计算时间。同时采用十字链表法和CSR稀疏存储结构存储方程组稀疏矩阵中的非零元素,以降低内存消耗。使用方腔驱动流模型验证了GPU并行程序的有效性,并且并行程序加速比随方腔模型规模的增大而增大。将GPU并行程序应用于变压器绕组模型的瞬态流体场仿真分析,分析结果表明,相较于串行程序,GPU并行程序加速比达到16倍左右。论文所实现的基于GPU并行计算方法为产品级变压器瞬态流体场仿真奠定了基础。 展开更多
关键词 无量纲最小二乘有限元 瞬态流体场 gpu 并行计算
下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
20
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-performance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenMP. This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-1A, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 计算系统 科学应用 异构系统 PETA 编程模型 并行系统 超级计算机 领域专家
下载PDF
上一页 1 2 46 下一页 到第
使用帮助 返回顶部