期刊文献+

准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化 被引量:2

Implementation and Optimization of Quasi Diagonal Matrix-Vector Multiplication on CPU +GPU Heterogeneous Cluster
下载PDF
导出
摘要 稀疏矩阵与向量相乘(Sp MV)是科学计算和工程应用中一个重要问题,而且非常适宜进行并行计算,目前在GPU对Sp M V的实现和优化是一个研究热点.针对准对角矩阵存在的一些不规则性,采用CSR+DLA混合存储格式来进行Sp M V计算,能够提高压缩的效果.为了发挥CPU多核的并行计算能力,采用一种CPU+GPU混合计算模式,这样可以把混合存储格式不同格式的数据分割到CPU和GPU上,从而提高了资源的利用效能.本文另外还在分析CPU+GPU异构计算模式的特征基础上,提出一些优化策略,能够改进准对角矩阵与向量相乘在异构计算环境中的计算性能. Sparse matrix vector multiplication (SpMV ) is an important issue in scientific computing and engineering applications, andis very suitable for parallel computing, the GPU implementation and optimization of SpMV is a research hotspot. In this paper we focuson a special SpMV, sparse quasi-diagonal matrix vector multiplication, which has irregular nonzero data distribution. We present a hy-brid diagonal storage format (hybrid DIA and CSR ) to get higher compression ratio than the DIA and CSR. It is possible to split thedata to the CPU and GPU on CPU + GPU hybrid computing model to take full advantage of both CPU and GPU computing resourcesand be able to play the CPU and GPU computing features, which can enhance the resource utilization efficiency. Moreover based on a-nalysis of the characteristics of the CPU + GPU heterogeneous computing model,proposed some optimization strategies can improveSQMVM performance computing in a heterogeneous computing environment.
出处 《小型微型计算机系统》 CSCD 北大核心 2015年第7期1659-1664,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金重点项目(61432005)资助 国家自然基金项目(61472124)资助 湖南省教育厅科学研究重点项目(13A011)资助
关键词 图形处理芯片 稀疏矩阵 稀疏矩阵与向量相乘 异构计算 GPU Sparse matrix SpMV heterogeneous computing
  • 相关文献

参考文献13

  • 1Guo D, Gropp W. Optimizing sparse data structures for matrix-vec-tor multiply [ J]. International Journal of High Performance Computing Applications,2011,25( 1) : 115-131.
  • 2Dehnavi M M,Fem'andez D M,Giannacopoulos D. Finite-element sparse matrix vector multiplication on graphic processing units[ J]. IEEE Transactions on Magnetics,2010,46(8) :2982-2985.
  • 3袁娥,张云泉,刘芳芳,孙相征.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,46(7):1117-1126. 被引量:15
  • 4Zhang N. A novel parallel scan for multicore processors and its application in sparse matrix-vector multiplication [ J ]. Parallel and Distributed Systems,IEEE Transactions on,2012,23(3):397404.
  • 5Baskaran M M, Bordawekar R. Optimizing sparse matrix-vector multiplication on GPUs, Technical report, IBM Research Report RC24704(W0812-047) ,2008.
  • 6Bell N, Garland M. Effcient sparse matrix-vector multiplication on cuda[R]. NVIDIA Technical Report NVR-2008-004,Tech. Rep. , Dem-cember,2008.
  • 7Monakov A,Lokhmotov A,Avetisyan A. Automatically tuning sparse matrix-vector multiplication for gpu architectures [ C]. Proceedings of International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'10) ,2010:111-125.
  • 8Katagiri,Takahiro,et al. Control formats for unsymmetric and symmetric sparse matrix-vector multiplications on OpenMP implementations [ C ]. High Performance Computing for Computational Sci-ence-VECPAR 2012,Springer Berlin Heidelberg,2013 :236-248.
  • 9Boyer B, Dumas J G, Giorgi P. Exact sparse matrix-vector multiplication on GPU's and multicore architectures. In Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, ACM,2010:80-88.
  • 10Williams S.Oliker L,Vuduc R,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms[ J]. Parallel Computing ,2009,35 (3) : 178-194.

二级参考文献12

  • 1袁伟,张云泉,孙家昶,李玉成.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. 被引量:13
  • 2Vuduc Wilson.Automatic Performance of Sparse Matrix Kernels[D].Berkeley,CA:University of California,2003.
  • 3Im Eun Jin,Yelick Katherine.Optimizing sparse matrix computations for register reuse in SPARSITY[G] //LNCS 2073,Proc of the Int Conf on Computational Science.Berlin,Springer,2001,127-136.
  • 4Im Eun Jin,Yelick Katherine,Vudue Wilson.Sparsity,Optimization framework for fparse matrix kernels[J].International Journal of High Performance Computing Applications,2004,18(1):135-158.
  • 5Vuduc Wilson,Demmel James,Yelick Katherine,et al.Performance optimizarions and bounds for sparse matrixvector multiply[C] //Proc of Supercomputing.Los Alamitos,CA:IEEE Computer Society,2002= 1-35.
  • 6Vuduc Wilson,Demmel James,Bilmes Jeff.Statistical models for empirical search-based performance tuning[J].International Journal of High Performance Computing Applications,2004,18(1):65-94.
  • 7Demmel James,Yelick Katherine.Berkeley Benchmarking and OPtimization Project[OL].2006 [2007-11-20],http:// bebop.cs.berkeley.edu/.
  • 8Voduc Wilson,Demmel James,Yelick Katherine.OSKI,A library of automatically tuned sparse matrix kernels[C] //Proc of SciDAC 2005:Journal of Physics,Conference Series.Philadelphia,PA:IOP,2005:521-530.
  • 9Davis Tim.University of Florida sparse matrix collection[OL].2006[2007-11-20].http://www.else.ufl.edu/ research/sparse/matrices/.
  • 10张云泉.面向数值计算的并行计算模型DRAM(h.k)[C]//863计划智能计算机主题学术会议论文集:智能计算机研究进展.北京,清华大学出版社,2001:218-225.

共引文献14

同被引文献9

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部