期刊文献+

基于块子空间迭代算法的GPU加速

On GPU-based acceleration of block subspace iterative methods
下载PDF
导出
摘要 利用块Krylov子空间方法结合GPU(图形处理单元)对线性方程组求解进行加速.利用GPU进行计算具有并行度高的好处,并能提高计算效率.数值算例说明,块算法在GPU上的运行效率要高于非块算法在CPU上的运行效率.但是对于块算法,谨慎地选择块的大小对于提升整个问题求解的速度也是非常重要的. The block Krylov subspace method is used to speed up the solution of the system of the linear equation by using the GPU. By the advantages of high degree of parallelism of the GPU, the computational efficiency can be improved. The numerical example shows that the running efficiency of the block algorithm on the GPU is higher than that of the non-block algorithm on the CPU. ~rthermore, for the block algorithm, the size of the block is also very important for improving the speed of the whole problem.
出处 《应用数学与计算数学学报》 2016年第1期138-147,共10页 Communication on Applied Mathematics and Computation
关键词 块子空间迭代算法 GPU加速 大规模稀疏线性代数方程组 block subspace iterative methods GPU-based acceleration large scaled linear algebra equations with sparse coefficient matrix
  • 相关文献

参考文献11

  • 1徐树方.数值线性代数[M].北京:北京大学出版社,2014.
  • 2程豪,张云泉,张先轶,李玉成.CPU-GPU并行矩阵乘法的实现与性能分析[J].计算机工程,2010,36(13):24-26. 被引量:11
  • 3Vzquez F, Fernndez J J, Garzdn E M. A new approach for sparse matrix vector production NVIDIA GPUs [J]. Concurrency and Computation Practice and Experience, 2011, 23(8): 815- 826.
  • 4Demmel J, Hoemmen M, Mohiyuddin M, Yelick K. Minimizing communication in sparse matrix solvers [J]. Sc Conference, 2009, 19(1): 1-12.
  • 5Carson E, Knight N, Demmel J. Avoiding communication in nonsymmetric-Lanczos-based Krylov subspace methods [J]. SIAM Journal on Scientific Computing, 2013, 35(5): $42-$61.
  • 6Demmel J, Hoemmen M. Communication-Avoiding Variants of GMRES and CG [R]. [S.1.]: University of California Berkeley, 2007.
  • 7Demmel J, Hoemmen M, Mohiyuddin M, Yelick K. Avoiding Communication in Computing Krylov Subspaces [R]. IS.1.]: University of California Berkeley, 2007.
  • 8Saal Y. Iterative Methods for Sparse Linear Systems [M]. 2nd ed. [S.1.]: SIAM, 2000.
  • 9Parlett B N. The symmetric eigenvalue problem [J]. Mathematics of Computation, 1981, 37(4): 1-22.
  • 10Walker H F. Implementation of the GMRES method using Householder transformation [J]. SIAM Journal on Scientific Computing, 1988, 9: 152-163.

二级参考文献5

  • 1Volkov V,Demmel J W.Benchmarking GPUs to Tune Dense Linear Algebra[C] //Proc.of the ACM/IEEE Conference on Supercomputing.Austin,Texas,USA:IEEE Press,2008:1-11.
  • 2Hall J D,Cart N A,Hart J C.Cache and Bandwidth Aware Matrix Multiplication on the GPU[R].Dept.of Computer Science,University of Illinois at Urbana-Champaign,Tech.Rep:UIUCDCS-R-2003-2328,2003.
  • 3Li Yinan,Dongarra J,Tomov S.A Note on Auto-tuning GEMM for GPUs[R].Dept.of Computer Science,University of Tennessee,Tech.Rap.:UT-CS-09-635,2009.
  • 4Ohshima S,Kise K,Katagiri T,et al.Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment[C] // Proc.of the 7th International Meeting on High Performance Computing for Computational Science.Rio de Janeiro,Brazil:Springer,2006:305-318.
  • 5Fatiea M.Accelerating Linpack with CUDA on Heterogenous Chsters[C] //Proc.of the 2nd Workshop on General Purpose Processing on Graphics Processing Units.Washington D.C.,USA:ACM Press,2009:46-51.

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部