线性系统求解中迭代算法的GPU加速方法被引量：4

Accelerating Iterative Methods in Solving Linear Systems on GPUs

下载PDF

导出

摘要在求解线性系统时,迭代法是一种基本的方法,特别是在系数矩阵为大规模稀疏矩阵的情况下,高效地使用迭代法求解变得十分重要。本文通过分析迭代法的一般特点,提出了使用具有强大计算能力和存储带宽的GPU加速迭代法的一般方法。利用这些方法,在两种主流GPU平台上实现了一个经典的迭代法PQMRCGSTAB,并且针对不同的GPU平台特点提出了具体的优化方法。与AMD Opteron 2.4GHz 4核处理器相比,双精度版本的PQMRCGSTAB算法经NVIDIA Tesla S1070加速后性能提高31倍,经AMD Radeon HD 4870 X2加速后性能提高9倍。 Iterative method is a basic solution as solvers of the linear systems. Especially given its role in the systems with large scale sparse coefficient matrix, iterative method is of singular importance. In this paper, after analyzing the common characters of iterative methods, we present a few general approaches to accelerate iterative methods on GPUs with high computability and memory bandwidth. With our approaches, a classic iterative method named PQMRCGSTAB is accommodated to two popular GPU platforms, and we also introduce optimizations basing on the different architectures. Compared with a program running parallel on four cores of an AMD Opteron（tm） qua＆core processor 2380, PQMRCGSTAB algorithm on a NVIDIA Tesla S1070 platform achieves 31x higher speed; while on an AMD Radeon HD 4870 X2 platform, the algorithm obtains 9x higher speed.

作者葛振杨灿群吴强陈娟

机构地区国防科技大学计算机学院

出处《计算机工程与科学》 CSCD 北大核心 2009年第A01期179-182,共4页 Computer Engineering & Science

基金国家863计划资助项目(2008AA01Z110) 国家自然科学基金资助项目(60673150 60903044)

关键词 GPU 迭代法加速 PQMRCGSTAB算法 GPU iterative method acceleration PQMRCGSTAB algorithm

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献15

1Barrachina S, Castillo M, Igual F D,et al. Solving Dense Linear Systems on Graphics Proeessors[C]//Proc of Euro-Par, 2008 : 739-748.
2Liu W, Schmidt B, Voss G, et al. Molecular Dynamics Simulations on Commodity GPUs with CUDAEC] ffProc of Int'l Conf on High Performance Computing, 2007 : 185-196.
3Scheuermann T, Hensley J. Efficient Histogram Generation Using Scattering on GPUs[C]//Proc of the 2007 Symp on Interactive 3D Graphics and Games, 2007:33-37.
4Yao Z H, Yuan M W. Computational Methods in Engineering &Science[C]//Proc of Enhancement and Promotion of Computational Methods in Engineering and Science X, 2006: 21- 23.
5NVIDIA. CUDA Programming Guide 2. 1[EB/OL]. [-2009- 05-06]. http: /// developer, download, nvidia, com/compute/ cuda/2 _ 1/toolkit/do-cs/NVIDIA _ CUDA _ Programming _ Guide 2. 1. pdf.
6AMD Steam[EB/OL]. [2009-04-10]. http://www, amd. com/stream.
7Buatois L, Caumon G, Levy B. Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU[C]//Proc of the High-Performance Computation Conf,2007.
8Bell N, Garland M. Efficient Sparse Matrix-Vector Multiplication on CUDA[R]. NVIDIA Technical Report NVR-2008- 004,2008.
9Iterative Method [ EB/OL]. [ 2009-04-11 ]. http:// baike. baidu, com/view/649495, htm.
10Compressed Row Storage[EB/OL]. [2009-03-15]. http:/// www. es. utk. edu/-dongarra/etemplates/node373, html.

同被引文献24

1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量：141
2孙济洲,樊莉亚,孙敏,于策,张绍敏.改进的并行高斯全主元消去法[J].天津大学学报,2006,39(9):1115-1119. 被引量：7
3Bauder G M.Asynchronous iterative methods for multiprocessors[J]. J of the ACM, 1978,25(2) :226-244.
4Quintana-Orti G, Igual F D, Quintana-Orti E S, et al.Solving dense linear systems on platforms with multiple hardware aceel- erators[J].ACM SIGPLAN Notices, 2009,4(44) : 121-130.
5Kirk D B, Wen-mei W H.Programming massively parallel pro- cessors: a hands-on approach[M].San Francisco, CA, USA: Mor- gan Kaufmann Publishers Inc,2010.
6Duff I S. Vorst H A. van der. Developments and Trends in the Parallel Solution of Linear Systems[J]. Parallel Computing. 1999. 25(13/14): 1931-1970.
7Bell A. Haverkort B R. Serial and Parallel Out-of-Core Solution of Linear Systems Arising from Generalised Stochastic Petri Nets[C]/ /Proc High Performance Computing 2001. Seattle:[so n.J. 2001: 242-247.
8Mehmood R. Out-of-Core and Parallel Iterative Solutions for Large Markov Chains[R]. Edgbaston , University of Birmingham. 2001.
9Engel W. GPU Pro 3: Advanced Rendering Techniques[M]. Natick. MA: CRC Press. 2012: 70-75.
10BolzJ. Farmer 1, Grinspun E. et al. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid[J]. ACM Transactions on Graphics. 2003. 22(3): 917-924.

引证文献4

1杨梅,李志民,曹大勇.CUDA架构下大规模稠密线性方程组的并行求解[J].计算机工程与应用,2011,47(32):27-30. 被引量：6
2郑作勇,瑞霞.GPU上循环矩阵的快速求逆算法[J].计算机工程与科学,2012,34(7):84-88. 被引量：2
3李伟伟.基于GPU的对称正定稀疏矩阵复线性方程组迭代算法[J].吉林大学学报（理学版）,2016,54(2):297-302.
4王驰,刘羽.GPU优化的大规模线性方程组并行求解的研究与比较[J].信息通信,2016,29(12):9-11.

二级引证文献8

1李正夫,王希诚,郭权.CUDA下受体评分网格生成并行算法[J].计算机应用研究,2013,30(3):814-816. 被引量：1
2许亮,王震.基于CUDA的快速大整数乘法[J].计算机工程与应用,2013,49(16):221-224. 被引量：3
3陈浩,陈兆学,喻海中.基于计算统一设备架构的高斯径向基图像插值快速实现方法研究[J].生物医学工程学杂志,2014,31(2):237-244.
4张国亮,沈慧,石峰,霍迎秋.大型实对称矩阵分块迭代求逆算法[J].无线互联科技,2015,12(6):127-129. 被引量：2
5霍迎秋,王武星,彭楚风,方勇.基于CUDA的大型实对称矩阵并行求逆算法[J].计算机工程与设计,2015,36(8):2133-2137.
6李正夫,王希诚,李克秋,姚翔,董悦丽.CUDA平台下信息熵多种群遗传算法设计[J].计算机工程与应用,2016,52(1):12-16.
7谷国太,肖汉.求解线性方程组的GPU并行算法[J].河南水利与南水北调,2019,48(10):70-72. 被引量：1
8窦鑫盛.一种基于滑动窗口与旋转向量的高斯-约当消元算法[J].现代计算机,2021,27(30):64-67.

1张宏立.利用EXCEL快速生成MATLAB大规模稀疏矩阵[J].新疆大学学报（自然科学版）,2005,22(1):102-104. 被引量：2
2陈丽敏,杨静,张健沛.一种基于加速迭代的大数据集谱聚类方法[J].计算机科学,2012,39(5):172-176. 被引量：7
3吴洋,赵永华,纪国良.一类大规模稀疏矩阵特征问题求解的并行算法[J].数值计算与计算机应用,2013,34(2):136-146. 被引量：5
4孙李辉,李俊山.气动退化图像自适应空域正则化复原[J].计算机仿真,2013,30(11):218-223.
5范玉强,龙慧云,吴云.K-means算法在隐语义模型中的应用[J].计算机与数字工程,2016,44(4):572-574. 被引量：1
6胡长军,李永红,常晓东,丁良.大规模稀疏矩阵在并行应用中的通信优化研究[J].计算机应用研究,2008,25(1):74-77.
7张宏立.利用EXCEL快速生成MATLAB大规模稀疏矩阵[J].微计算机应用,2004,25(4):492-492.
8刘来国,徐炜遐,杨灿群,陈娟.基于GPU的LARED-P算法加速[J].计算机工程与科学,2009,31(A01):59-63. 被引量：3
9黄敏,丁萍,罗海飚.共轭梯度法在GPU及Xeon Phi下的并行优化及比较[J].华南理工大学学报（自然科学版）,2015,43(11):35-46. 被引量：1
10张连伟,刘大学,刘肖琳,李焱,徐昕,贺汉根.基于图形处理器的点云快速光顺[J].计算机工程与科学,2011,33(4):86-92. 被引量：1

计算机工程与科学

2009年第A01期

浏览历史

内容加载中请稍等...

线性系统求解中迭代算法的GPU加速方法被引量：4

参考文献15

同被引文献24

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

线性系统求解中迭代算法的GPU加速方法 被引量：4

参考文献15

同被引文献24

引证文献4

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

线性系统求解中迭代算法的GPU加速方法被引量：4