期刊文献+

一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现 被引量:5

Quasi-diagonal Matrix Hybrid Compression Algorithm and Implementation for SpMV on GPU
下载PDF
导出
摘要 稀疏矩阵与向量乘(SpMV)属于科学计算和工程应用中的一种基本运算,其高性能实现与优化是计算科学的研究热点之一。在微分方程的求解过程中会产生大规模的稀疏矩阵,而且很大一部分是一种准对角矩阵。针对准对角矩阵存在的一些不规则性,提出一种混合对角存储(DIA)和行压缩存储(CSR)格式来进行SpMV计算,对于分割出来的对角线区域之外的离散非零元素采用CSR存储,这样能够克服DIA在不规则情况下存储矩阵的列迅速增加的缺陷,同时对角线采用DIA存储又能充分利用矩阵的对角特征,以减少CSR的行非零元素数目的不均衡现象,并可以通过调整存储对角线的带宽来适应准对角矩阵的不同的离散形式,以获得比DIA和CSR更高的压缩比,减小计算的数据规模。利用CUDA平台在GPU上进行了实验测试,结果表明该方法比DIA和CSR具有更高的加速比。 Sparse matrix-vector multiplication(SpMV) is of singular importance in sparse linear algebra,which is an im- portant issue in scientific computing and engineering practice. Much effort has been put into accelerating the SpMV and a few parallel solutions have been proposed. In this paper we focused on a special SpMV, sparse quasi-diagonal matrix multiplication(SQDMV). The sparse quasi diagonal matrix is the key to solve many differential equation and very little research is done on this field. We discussed data structures and algorithms for SQDMV that were efficiently implemen- ted on the CUDA platform for the fine-grained parallel architecture of the GPU. We presented a new diagonal storage format HDC, which overcomes the inefficiency of DIA in storing irregular matrix and the imbalances of CSR in storing non-zero element. Further, HI)C can adjust the storage bandwidth of the diagonal to adapt to different discrete degree of sparse matrix, so as to get higher compression ratio than the DIA and CSR, reduce the computation complexity. Our im- plementation in GPU shows that the performance of HDC is better than other format especially for matrix with some discrete points outside the main diagonal. In addition, we combined the different parts of HDC to a unified kernel to get better compress ration and higher speedup ratio in GPU.
出处 《计算机科学》 CSCD 北大核心 2014年第7期290-296,共7页 Computer Science
基金 国家自然科学基金重点项目(61133005) 国家自然基金项目(61070057) 国家科技支撑计划项目(2012BAH09B02) 教育部科技创新工程重大项目培育资金项目(708066) 教育部博士点基金(20100161110019) 教育部新世纪优秀人才支持计划(NCET-08-0177) 湖南省教育厅重点科研项目(13A011)资助
关键词 图形处理芯片 稀疏矩阵 稀疏矩阵与向量相乘 CUDA GPU, Sparse matrix, SpMV, CUDA
  • 相关文献

参考文献27

  • 1Amestoy P R,Davis T A,Duff I S.An approximate nminimmum degree ordering algorithm[J].SIAM Journal on Matrix Analysis and Applications,1996,17 (4):886-905.
  • 2袁娥,张云泉,刘芳芳,孙相征.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,46(7):1117-1126. 被引量:15
  • 3白洪涛,欧阳丹彤,李熙铭,李亭,何丽莉.基于GPU的稀疏矩阵向量乘优化[J].计算机科学,2010,37(8):168-171. 被引量:14
  • 4王伟,陈建平,曾国荪,俞莉花,谭一鸣.大规模稀疏矩阵的主特征向量计算优化方法[J].计算机科学与探索,2012,6(2):118-124. 被引量:3
  • 5Baskaran M M,Bordawekar R.Optimizing sparse matrix-vector multiplication on GPUs[R].Technical report IBM Research Report RC24704(W0812-047),2008.
  • 6Bell N,Garland M.Effcient sparse matrix-vector multiplication on cuda[R].NVIDIA Technical Report NVR-2008-004.Demcember 2008.
  • 7Shan Y,Wu T,Wang Y,et al.Fpga and gpu implementation of large scale spmv[C]// Proceedings of IEEE 8th Symposium on Application Specific Processors (SASP ' 10).Anaheim,California,USA,June 2010:67-70.
  • 8王迎瑞,任江勇,田荣.基于GPU的高性能稀疏矩阵向量乘及CG求解器优化[J].计算机科学,2013,40(3):46-49. 被引量:7
  • 9Monakov A,Lokhmotov A,Avetisyan A.Automatically tuning sparse matrix-vector multiplication for gpu architectures[C]//Proceedings of International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC ' 10).2010:111-125.
  • 10V'azquez F,Ortega G,Fern' andez J,et al.Improving the performance of the sparse matrix vector product with gpus[C]//Proceedings of IEEE International Conference on Computer and Information Technology (CIT ' 10).Bradford,June 2010:1146-1151.

二级参考文献47

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:227
  • 2袁伟,张云泉,孙家昶,李玉成.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. 被引量:13
  • 3Vuduc Wilson.Automatic Performance of Sparse Matrix Kernels[D].Berkeley,CA:University of California,2003.
  • 4Im Eun Jin,Yelick Katherine.Optimizing sparse matrix computations for register reuse in SPARSITY[G] //LNCS 2073,Proc of the Int Conf on Computational Science.Berlin,Springer,2001,127-136.
  • 5Im Eun Jin,Yelick Katherine,Vudue Wilson.Sparsity,Optimization framework for fparse matrix kernels[J].International Journal of High Performance Computing Applications,2004,18(1):135-158.
  • 6Vuduc Wilson,Demmel James,Yelick Katherine,et al.Performance optimizarions and bounds for sparse matrixvector multiply[C] //Proc of Supercomputing.Los Alamitos,CA:IEEE Computer Society,2002= 1-35.
  • 7Vuduc Wilson,Demmel James,Bilmes Jeff.Statistical models for empirical search-based performance tuning[J].International Journal of High Performance Computing Applications,2004,18(1):65-94.
  • 8Demmel James,Yelick Katherine.Berkeley Benchmarking and OPtimization Project[OL].2006 [2007-11-20],http:// bebop.cs.berkeley.edu/.
  • 9Voduc Wilson,Demmel James,Yelick Katherine.OSKI,A library of automatically tuned sparse matrix kernels[C] //Proc of SciDAC 2005:Journal of Physics,Conference Series.Philadelphia,PA:IOP,2005:521-530.
  • 10Davis Tim.University of Florida sparse matrix collection[OL].2006[2007-11-20].http://www.else.ufl.edu/ research/sparse/matrices/.

共引文献28

同被引文献14

引证文献5

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部