GPU集群加速近似逆预条件CG并行求解器被引量：1

Approximate Inverse Preconditioned CG Parallel Solver on GPU Cluster

下载PDF

导出

摘要针对GPU集群系统,研究了分解近似逆(approximate inverse,AINV)和对称逐次超松弛-近似逆(symmetric successive over relaxation approximate inverse,SSOR-AI)两类近似逆预条件的并行算法。采用多级k-路图划分方法,通过子图的内点和边界点识别方法以及稀疏矩阵的置换技术,提出了将稀疏矩阵转换为分块箭形矩阵的并行方法。基于所形成的分块箭形矩阵,结合块内稀疏矩阵近似逆串行、块间并行的策略给出了近似逆预条件的并行方法,实现了AINV和SSOR-AI并行算法,解决了AINV预条件难以并行的问题。基于CPU与GPU协同计算、主机端页锁定内存和设备端计算与通信重叠的优化技术,实现了并行近似逆预条件与共轭梯度(conjugate gradient,CG)算法相结合的线性方程组混合并行求解器。数值实验表明,所提方法对AINV和SSOR-AI两类近似逆预条件,在多GPU上获得了很好的可扩展性和加速效果。 This paper shows the study on the parallel algorithm of AINV（approximate inverse） and SSOR-AI（symmetric successive over relaxation approximate inverse） preconditioners on GPU cluster systems. With multilevel kway graph partitioning, this paper proposes the parallel method which can transform a sparse matrix into block arrow form based on a method to identify interior/boundary vertex of subgraphs and a permutation. Based on the block arrow matrix, with the strategy of sequential computation approximate inverse of inner block and parallel computation between the different blocks, the parallel algorithm of AINV and SSOR- AI is obtained. Based on the optimization techniques of collaborative computing between CPU and GPU, page- locked host memory and overlapping transfers with computation on device, this paper combines parallel approximate inverse preconditioner with CG（conjugate gradient） algorithm to obtain a hybrid parallel solver for linear systems. Numerical experiments indicate that applying the above methods can obtain very good acceleration effect and scalability both AINV parallel implementation and SSOR-AI parallel implementation on cluster-GPU.

作者赵莲赵永华陈尧赵慰

机构地区中国科学院计算机网络信息中心中国科学院大学

出处《计算机科学与探索》 CSCD 北大核心 2015年第9期1084-1092,共9页 Journal of Frontiers of Computer Science and Technology

基金国家重点基础研究发展计划(973计划)No.2011CB309702 数学工程与先进计算国家重点实验室开放基金No.2014A03~~

关键词近似逆预条件迭代法异构并行计算 GPU集群 approximate inverse preconditioner iterative method heterogeneous parallel computing GPU cluster

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献15

1Naumov M. Incomplete-LU and Cholesky preconditioned iterative methods using CUSPARSE and CUBLAS[EB/OL]. NVIDIA Corporation (2011-06)[2014-10-21]. https:lldeveloper.nvidia.com.
2Li Ruipeng, Saad Y. GPU-accelerated preconditioned iterative linear solvers[J]. The Journal of Supercomputing, 2013, 63 (2): 443-466.
3Ament M, Knittel G, WeiskopfD, et al. A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform[C]//Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and NetworkBased Processing, Pisa, Feb 17-19,2010, Piscataway, NJ, USA: IEEE, 2010: 583-592.
4Benzi M, Tuma M. A comparative study of sparse approximate inverse preconditioners[J]. Applied Numerical Mathematics, 1999,30(2): 305-340.
5Djanali V S, Armfield S, Kirkpatrick M, et al. Preconditioning in parallel for fractional step Navier-Stokes solvers[J]. ANZIAM Journal, 2012, 53: C19-C33.
6Helfenstein R, Koko J. Parallel preconditioned conjugate gradient algorithm on GPU[J]. Journal of Computational and Applied Mathematics, 2012, 236(15): 3584-3590.
7Lukash M, Rupp K, Selberherr S. Sparse approximate inverse preconditioners for iterative solvers on GPUs[C]//Proceedings of the 2012 Symposium on High Performance Computing, Orlando, USA, Mar 26-29,2012. San Diego, CA, USA: Society for Computer Simulation International, 2012: 13.
8Sawyer W, Vanini C, Fourestey G, et at. SPAl preconditioners for HPC applications[J]. Proceedings in Applied Mathematics and Mechanics, 2012,12(1): 651-652.
9Xu K, Ding D Z, Fan Z H, et at. FSAI preconditioned CG algorithm combined with GPU technique for the finite element analysis of electromagnetic scattering problems[J]. Finite Elements in Analysis and Design, 2011, 47(4): 387-393.
10Benzi M, Marin 1, Tiima M. A two-level parallel preconditioner based on sparse approximate inverses[J]. Iterative Methods in Scientic Computing, 1999: I-II.

同被引文献2

1王威,吴小平.电阻率任意各向异性三维有限元快速正演[J].地球物理学进展,2010(4):1365-1371. 被引量：9
2吴小平,汪彤彤.利用共轭梯度算法的电阻率三维有限元正演[J].地球物理学报,2003,46(3):428-432. 被引量：72

引证文献1

1杜伟,吴小平.基于GPU集群加速的电阻率三维数值模拟[J].物探化探计算技术,2018,40(1):126-133.

1袁再龙.基于改进图划分的异构并行计算模型设计[J].计算机测量与控制,2014,22(6):1941-1943. 被引量：1
2李磊,毛志忠.基于近似模型的电弧炉解耦控制器[J].控制理论与应用,2013,30(1):101-110. 被引量：2
3叶丹,张成毅,罗双华.基于稀疏指数追踪模型的SOR-Half阈值算法[J].纺织高校基础科学学报,2015,28(4):457-462. 被引量：2
4陆鑫达,郑飞.异构并行计算及其调度策略[J].计算机工程,1997,23(S1):37-39. 被引量：1
5温万里,游林.基于混沌和比特级置乱的并行图像加密算法[J].信息网络安全,2014(4):40-45. 被引量：2
6Takashi Inada,Yuuta Sodani,Isao Nakanishi.Intra-Palm Propagation Signals as Suitable Biometrics for Successive Authentication[J].Computer Technology and Application,2016,7(2):65-72.
7夏晓天,王斌,张立明.基于鲁棒估计的遥感图像融合方法[J].复旦学报（自然科学版）,2013,52(3):347-355.
8刘坤.非线性系统的神经网络逆模型控制[J].南京工程学院学报（自然科学版）,2004,2(3):40-45. 被引量：2
9章美仁.页置换技术在搜索引擎采集器中的应用[J].计算机工程与设计,2009,30(5):1137-1139.
10温万里,游林.基于OpenCL的并行图像加密算法研究[J].南阳理工学院学报,2015,7(2):51-57.

计算机科学与探索

2015年第9期

浏览历史

内容加载中请稍等...

GPU集群加速近似逆预条件CG并行求解器被引量：1

参考文献15

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

GPU集群加速近似逆预条件CG并行求解器 被引量：1

参考文献15

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

GPU集群加速近似逆预条件CG并行求解器被引量：1