期刊文献+

基于GPU的高效稀疏矩阵存储格式研究 被引量:8

Study on Efficient Storage Format of Sparse Matrix Based on GPU
下载PDF
导出
摘要 针对基于GPU求解大规模稀疏线性方程组的问题,提出一种稀疏矩阵的存储格式HEC,并应用该格式在统一计算设备架构(CUDA)平台上实现不完全LU分解的预条件共轭梯度(ILUCG)法。该存储格式由ELL与CSR格式混合而成,将其以调用GPU kernel的方式实现ILUCG法并应用于大型稀疏线性系统的求解中,可提高稀疏矩阵的存储效率,减少稀疏矩阵与向量乘(SpMV)的运算时间。实验结果表明,与目前广泛使用的基于CSR和HYB存储格式并调用CUSPARSE库函数的实现方式相比,该实现方式最优可得10.4%的加速效果,并且具有良好的SpMV运算性能。 In order to solve large-scale sparse group of linear equations based on Graphic Processing Unit(GPU),in this paper,a storage format HEC( Hybrid ELL and CSR) of sparse matrix is proposed to solve sparse linear equations on GPU. This format is successfully used in Compute Unified Device Architecture( CUDA) to realize Incomplete LU factorization preconditioned Conjugate Gradient( ILUCG) method. It consists of ELL and CSR( Compressed Sparse Row)and is applied to solve the large sparse linear systems by calling GPU kernel in ILUCG. The storage efficiency of sparse matrices can be improved and operation time of Sparse Matrix-Vector multiplication(SpMV) can be reduced. The way by calling GPU kernel and being stored in HEC is compared with which is carried out by calling CUSPARSE library functions based on CSR and HYB( Hybrid). The result shows that the acceleration of the best available is 10. 4%,and the way by using HEC storage format has good SpMV performance.
作者 程凯 田瑾 马瑞琳 CHENG Kai,TIAN Jin,MA Ruilin(College of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, Chin)
出处 《计算机工程》 CAS CSCD 北大核心 2018年第8期54-60,共7页 Computer Engineering
基金 上海市自然科学基金(15ZR1418900)
关键词 图像处理单元 CUSPARSE库 HEC存储格式 稀疏矩阵与向量乘 不完全LU分解 预条件共轭梯度法 Graphic Processing Unit (GPU) CUSPARSE library HEC ( Hybrid ELL and CSR) storage format SparseMatrix-Vector multiplication (SpMV) incomplete LU factorization preconditioned conjugate gradient method
  • 相关文献

参考文献8

二级参考文献100

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:225
  • 2李晓梅,吴建平.Krylov子空间方法及其并行计算[J].计算机科学,2005,32(1):19-20. 被引量:20
  • 3袁伟,张云泉,孙家昶,李玉成.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. 被引量:13
  • 4刘杰,迟利华,胡庆丰,李晓梅.并行计算稀疏矩阵乘以向量的负载平衡算法[J].计算机工程与科学,2006,28(3):76-77. 被引量:2
  • 5李爱芹.线性方程组的迭代解法[J].科学技术与工程,2007,7(14):3357-3364. 被引量:16
  • 6Vuduc Wilson.Automatic Performance of Sparse Matrix Kernels[D].Berkeley,CA:University of California,2003.
  • 7Im Eun Jin,Yelick Katherine.Optimizing sparse matrix computations for register reuse in SPARSITY[G] //LNCS 2073,Proc of the Int Conf on Computational Science.Berlin,Springer,2001,127-136.
  • 8Im Eun Jin,Yelick Katherine,Vudue Wilson.Sparsity,Optimization framework for fparse matrix kernels[J].International Journal of High Performance Computing Applications,2004,18(1):135-158.
  • 9Vuduc Wilson,Demmel James,Yelick Katherine,et al.Performance optimizarions and bounds for sparse matrixvector multiply[C] //Proc of Supercomputing.Los Alamitos,CA:IEEE Computer Society,2002= 1-35.
  • 10Vuduc Wilson,Demmel James,Bilmes Jeff.Statistical models for empirical search-based performance tuning[J].International Journal of High Performance Computing Applications,2004,18(1):65-94.

共引文献44

同被引文献62

引证文献8

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部