期刊文献+

GPU加速不完全Cholesky分解预条件共轭梯度法 被引量:3

GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method
下载PDF
导出
摘要 不完全Cholesky分解预条件共轭梯度(incomplete Cholesky factorization preconditioned conjugate gradient,ICCG)法是求解大规模稀疏对称正定线性方程组的有效方法.然而ICCG法要求在每次迭代中求解2个稀疏三角方程组,稀疏三角方程组求解固有的串行性成为了ICCG法在GPU上并行求解的瓶颈.针对稀疏三角方程组求解,给出了一种利用GPU加速的有效方法.为了增加稀疏三角方程组求解在GPU上的多线程并行性,提出了对不完全Cholesky分解产生的稀疏三角矩阵进行分层调度(level scheduling)的方法.为了进一步提高稀疏三角方程组求解的并行性能,提出了在分层调度前通过近似最小度(approximate minimum degree,AMD)算法对系数矩阵进行重排序、在分层调度后对稀疏三角矩阵进行层排序的方法,降低了分层调度过程中产生的层数,优化了稀疏三角方程组求解的GPU内存访问模式.数值实验表明,与利用NVIDIA CUSPARSE实现的ICCG法相比,采用上述方法性能可以获得平均1倍以上的提升. Incomplete Cholesky factorization preconditioned conjugate gradient (ICCG ) method is effective to solve large sparse symmetric positive definite linear systems . However ,ICCG method requires solving two sparse triangular linear systems during each iteration .The inherent serialism of solving sparse triangular becomes a bottleneck which prevents high efficient parallelization of ICCG method on GPU platform .In this paper ,an effective method to accelerate solving sparse triangular on GPU platform is proposed . In order to increase the multi‐thread parallelism of solving sparse triangular on GPU platform ,level scheduling is exploited for the sparse triangular matrixes which incomplete Cholesky factorization generates .For further improving the parallel performance of solving sparse triangular ,approximate minimum degree (AMD) algorithm is used to reorder the coefficient matrix before level scheduling .Moreover ,a novel method ,taking advantage of the level information to reorder the sparse triangular matrices after level scheduling ,is applied .These two methods can decrease the number of levels during level scheduling and optimize GPU memory access pattern to utilize memory coalescing in solving sparse triangular ,respectively .Numerical experiments indicate that compared with ICCG method implemented with NVIDIA CUSPARSE , applying the above methods can obtain more than 100% performance improvement on average .
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第4期843-850,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60873113) 国家自然科学基金重大研究计划项目(91430214) 国家"九七三"重点基础研究发展计划基金项目(2011CB309702) 国家"八六三"高技术研究发展计划基金项目(2012AA01A309) 数学工程与先进计算国家重点实验室开放基金项目(2014A03)
关键词 不完全Cholesky分解 预条件 共轭梯度法 重排序 图形处理器 incomplete Cholesky factorization preeonditioner conjugate gradient method reordering graphic processing unit (GPU)
  • 相关文献

参考文献15

  • 1Ament M, Knittel G, Weskopf D, et al. A parallel preconditioned conjugate gradient solver for the poisson problem on a muhi-gpu platform [C] //Proe of the 18th Euromicro Conf on Parallel, Distributed and Network-Based Processing. Piscataway, N J: IEEE, 2010:583-592.
  • 2Benzi M, Tuma M. A comparative approximate inverse preconditioners [J]. Mathematics, 1999, 30(2): 305-340.
  • 3Helfenstcin R, Koko J. Parallel precon gradient algorithm on GPU [J]. Journal and Applied Mathematics, 2012, 236(15).
  • 4Li R, Saad Y, GPU-accelerated precondiiioned iterative linear solvers I-J3. The Journal of Supereomputing, 2013, 63 (2) : 443-466.
  • 5Naumov M. Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS [R]. Santa Clara, CA; NVIDIA Corporation, 2011.
  • 6Sudan H, Klie H, Li R, et al. High performance manyeore solvers for reservoir simulation [C] /]Proe of the 12th European Conf on the Mathematics of Oil Recovery. Berlin Springer, 2010 [2013-09-20]. http://www-users, es. umn. edu/saad/PDF/A044, pdf.
  • 7Gupta R. A GPU implementation of a bubbly flow solver [D]. Delft, Holland: Delft University of Technology, 2009.
  • 8张健飞,沈德飞.基于GPU的稀疏线性系统的预条件共轭梯度法[J].计算机应用,2013,33(3):825-829. 被引量:10
  • 9Amestoy P R, Davis T A, Duff I S. An approximate minimum degree ordering algorithm [J]. SIAM Journal on Matrix Analysis and Applieations, 1996, 17(4): 886-905.
  • 10George A, 1.iu J W H. The evolution of the minimum degree ordering algorithm [J]. SIAM Review, 1989, 31 (1) : 1-19.

二级参考文献14

  • 1李晓梅,吴建平.Krylov子空间方法及其并行计算[J].计算机科学,2005,32(1):19-20. 被引量:20
  • 2李爱芹.线性方程组的迭代解法[J].科学技术与工程,2007,7(14):3357-3364. 被引量:16
  • 3曾攀.工程中的有限元方法[M]北京:清华大学出版社,2006.
  • 4Nvidia. NVIDIA CUDA C programming guide[EB/OL].http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf,2012.
  • 5KRUGER T,WESTERMANN R. Linear algebra operators for GPU implementation of numerical algorithms[J].ACM Transactions on Graphics,2003,(03):908-916.doi:10.1145/882262.882363.
  • 6BOLZ J,FARMER I,GRISPUN E. Sparse matrix solvers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics,2003,(03):917-924.doi:10.1145/882262.882364.
  • 7NATHAN B,MICHAEL G. Efficient sparse matrix-vector multiplication on CUDA[R].Santa Clara,California:NVIDIA,2008.
  • 8AIL C,AKIRA N,SATOSHI M. Fast conjugate gradients with multiple GPUs[A].Berlin:Springer-Verlag,2009.893-903.
  • 9MUTHU M B,RAJESH B. Optimizing sparse matrix-vector multiplication on GPUs[R].Armonk,NY:IBM,2009.
  • 10YOUSEF S. Iterative methods for sparse linear systems[M].Philadelphia:Society for Industrial and Applied Mathematics,2003.

共引文献9

同被引文献9

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部