异构平台中并行矩量法的加速技术被引量：1

Acceleration for the Parallel MoM Using GPU

下载PDF

导出

摘要本文主要研究了在CPU/GPU异构集群上的并行矩量法的加速技术。本文设计出一种MPI/CUDA软件编程架构,解决了CPU/GPU异构集群上并行LU分解跨节点计算的难题。此架构基于矩阵分块二维循环分布的数据分配策略,利用MPI实现计算节点之间的通信,同时利用GPU加速矩阵更新过程。为了突破GPU显存对LU分解的矩阵规模的限制,本文进一步研究了"显存—内存"核外算法。为了优化算法性能,本文提出了基于"CUDA流"技术和"异步通信"技术的设计方案,实现了GPU通信与计算的重叠,有效隐藏了GPU通信时间,获到了明显的加速效果。 The acceleration technique for the parallel Mo M on CPU/GPU hybrid system platform is studied. In this paper, based on the parallel data distribution scheme of matrix blocked 2-D circle, the MPI/CUDA software program architecture is designed,which uses MPI to achieve the internal communication and GPU to accelerate the matrix updates process. So the bottleneck of across nodes parallel LU factorization on CPU/GPU hybrid cluster is broken up. In order to overcome the restriction of GPU memory to the matrix scale factorized, the 'GPU memory-CPU memory' out-of-core technique is introduced. In order to optimize the performance of this algorithm, the designing scheme based on 'CUDA stream' and 'asynchronous communication' technologies is proposed which contributes to the overlap of GPU communication with computation, so the GPU communication time is hided and the obviously speedup is obtained.

作者陈岩张光辉林中朝张玉赵勋旺

机构地区西安电子科技大学电子工程学院

出处《微波学报》 CSCD 北大核心 2014年第S1期51-54,共4页 Journal of Microwaves

关键词矩量法异构平台 GPU加速并行核外隐藏通信 MoM hybrid system platform GPU acceleration parallel out-of-core hiding communication

分类号 TP332 [自动化与计算机技术—计算机系统结构] TP338.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献1

1马韬,陈明生,吴先良,刘艺,齐琪.基于GPU加速的高阶矩量法研究与应用[J].微波学报,2013,29(4):34-37. 被引量：2

二级参考文献10

1CHEN Mingsheng WU Xianliang HUANG Zhixiang SHA Wei.Chebyshev Approximation for Fast Frequency- Sweep Analysis of Electromagnetic Scattering Problem[J].Chinese Journal of Electronics,2006,15(4):736-738. 被引量：13
2Sanders J,Kandrot E.GPU高性能编程CUDA实战[M].聂雪军,译.北京:机械工业出版社,2011.
3Harrington R F. Field computation by moment method [ M ]. New York : Macmillan, 1968.
4Song Jun Park. An Analysis of GPU Parallel Computing [ A ]. DoD High Performance Computing Modernization Program Users Group Conference[ C] ,2009.365-369.
5NVIDIA Comporation Technical Staff. NVIDIA CUDA programming guide version 3.2 [ Z ]. USA : NVIDIA Cor- poration, 2008.
6Tomasz Topa, Andrzej Karwowski, Artur Noga. Using GPU with CUDA to accelerate MoM-based electromagnet- ic simulation of wire-grid models [ J ]. Antennas and Wireless Propagation Letters,2011 (10) :342-345.
7Topa T, Noga A, Karwowski A. Adapting MoM with RWG basis functions to GPU technology using CUDA [ J ]. Antennas and Wireless Propagation Letters, 2011 (10) :480-483.
8Shao X P, Nie Z P. Acceleration of the method of mo- ments calculations by using graphics processing units [ J]. IEEE Transactions on Antennas and Propagation, 2008,7:2130-2133.
9杜子静,张玉,赵勋旺,梁昌洪.并行高阶矩量法分析舰队RCS和其它电磁特性[J].微波学报,2011,27(4):53-56. 被引量：3
10张庆科,杨波,王琳,朱福祥.基于GPU的现代并行优化算法[J].计算机科学,2012,39(4):304-310. 被引量：27

共引文献1

1贾春刚,郭立新,刘伟.基于GPU的并行FDTD方法在二维粗糙面散射中的应用[J].电波科学学报,2016,31(4):683-687. 被引量：4

引证文献1

1陈金鑫,杨武.GPU加速的自适应积分法研究[J].微波学报,2016,32(S1):5-8.

1曾扬.循环分布及依赖关系破除的优化问题[J].计算机学报,1993,16(6):470-475. 被引量：1
2唐云凯.物联网传感器信息数据分配策略研究[J].世界有色金属,2015,40(9):133-134.
3郑宇,周广声.分布式数据库中数据分配策略及实例研究[J].计算机工程与应用,1997,33(12):1-4. 被引量：6
4唐红,刘宇翔.基于极限编程的软件质量管理[J].世界标准化与质量管理,2008(10):9-12. 被引量：1
5黄琼,冯军焕.混合协同过滤个性化推荐算法研究[J].计算机光盘软件与应用,2014,17(4):111-113. 被引量：2
6马小薇.基于压缩感知的OMP图像重构算法改进[J].电子科技,2015,28(4):51-53. 被引量：10
7李杰,徐勇,王云峰,朱昭贤.面向个性化推荐的强关联规则挖掘[J].系统工程理论与实践,2009,29(8):144-152. 被引量：45
8杜琦,姜浩,李宽,彭林,杨灿群.面向ARMv8 64位多核处理器QTRSM的实现[J].计算机工程与科学,2017,39(3):451-457. 被引量：1
9谢建全,谢勍,黄大足.一种基于游程长度的高安全性图像信息隐藏算法[J].计算机科学,2014,41(3):172-175. 被引量：9
10张玉,苏涛,翟会清,梁昌洪.PC群集系统中并行矩量法研究[J].电子学报,2003,31(9):1368-1371. 被引量：4

微波学报

2014年第S1期

浏览历史

内容加载中请稍等...

异构平台中并行矩量法的加速技术被引量：1

参考文献1

二级参考文献10

共引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

异构平台中并行矩量法的加速技术 被引量：1

参考文献1

二级参考文献10

共引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

异构平台中并行矩量法的加速技术被引量：1