准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化被引量：2

Implementation and Optimization of Quasi Diagonal Matrix-Vector Multiplication on CPU +GPU Heterogeneous Cluster

下载PDF

导出

摘要稀疏矩阵与向量相乘(Sp MV)是科学计算和工程应用中一个重要问题,而且非常适宜进行并行计算,目前在GPU对Sp M V的实现和优化是一个研究热点.针对准对角矩阵存在的一些不规则性,采用CSR+DLA混合存储格式来进行Sp M V计算,能够提高压缩的效果.为了发挥CPU多核的并行计算能力,采用一种CPU+GPU混合计算模式,这样可以把混合存储格式不同格式的数据分割到CPU和GPU上,从而提高了资源的利用效能.本文另外还在分析CPU+GPU异构计算模式的特征基础上,提出一些优化策略,能够改进准对角矩阵与向量相乘在异构计算环境中的计算性能. Sparse matrix vector multiplication （SpMV ） is an important issue in scientific computing and engineering applications, andis very suitable for parallel computing, the GPU implementation and optimization of SpMV is a research hotspot. In this paper we focuson a special SpMV, sparse quasi-diagonal matrix vector multiplication, which has irregular nonzero data distribution. We present a hy-brid diagonal storage format （hybrid DIA and CSR ） to get higher compression ratio than the DIA and CSR. It is possible to split thedata to the CPU and GPU on CPU ＋ GPU hybrid computing model to take full advantage of both CPU and GPU computing resourcesand be able to play the CPU and GPU computing features, which can enhance the resource utilization efficiency. Moreover based on a-nalysis of the characteristics of the CPU ＋ GPU heterogeneous computing model,proposed some optimization strategies can improveSQMVM performance computing in a heterogeneous computing environment.

作者阳王东李肯立

机构地区湖南城市学院信息科学与工程学院湖南大学信息科学与工程学院

出处《小型微型计算机系统》 CSCD 北大核心 2015年第7期1659-1664,共6页 Journal of Chinese Computer Systems

基金国家自然科学基金重点项目(61432005)资助国家自然基金项目(61472124)资助湖南省教育厅科学研究重点项目(13A011)资助

关键词图形处理芯片稀疏矩阵稀疏矩阵与向量相乘异构计算 GPU Sparse matrix SpMV heterogeneous computing

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献13

1Guo D, Gropp W. Optimizing sparse data structures for matrix-vec-tor multiply [ J]. International Journal of High Performance Computing Applications,2011,25( 1) : 115-131.
2Dehnavi M M,Fem'andez D M,Giannacopoulos D. Finite-element sparse matrix vector multiplication on graphic processing units[ J]. IEEE Transactions on Magnetics,2010,46(8) :2982-2985.
3袁娥,张云泉,刘芳芳,孙相征.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展,2009,46(7):1117-1126. 被引量：15
4Zhang N. A novel parallel scan for multicore processors and its application in sparse matrix-vector multiplication [ J ]. Parallel and Distributed Systems,IEEE Transactions on,2012,23(3):397404.
5Baskaran M M, Bordawekar R. Optimizing sparse matrix-vector multiplication on GPUs, Technical report, IBM Research Report RC24704(W0812-047) ,2008.
6Bell N, Garland M. Effcient sparse matrix-vector multiplication on cuda[R]. NVIDIA Technical Report NVR-2008-004,Tech. Rep. , Dem-cember,2008.
7Monakov A,Lokhmotov A,Avetisyan A. Automatically tuning sparse matrix-vector multiplication for gpu architectures [ C]. Proceedings of International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'10) ,2010:111-125.
8Katagiri,Takahiro,et al. Control formats for unsymmetric and symmetric sparse matrix-vector multiplications on OpenMP implementations [ C ]. High Performance Computing for Computational Sci-ence-VECPAR 2012,Springer Berlin Heidelberg,2013 :236-248.
9Boyer B, Dumas J G, Giorgi P. Exact sparse matrix-vector multiplication on GPU's and multicore architectures. In Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, ACM,2010:80-88.
10Williams S.Oliker L,Vuduc R,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms[ J]. Parallel Computing ,2009,35 (3) : 178-194.

二级参考文献12

1袁伟,张云泉,孙家昶,李玉成.国产万亿次机群系统NPB性能测试分析[J].计算机研究与发展,2005,42(6):1079-1084. 被引量：13
2Vuduc Wilson.Automatic Performance of Sparse Matrix Kernels[D].Berkeley,CA:University of California,2003.
3Im Eun Jin,Yelick Katherine.Optimizing sparse matrix computations for register reuse in SPARSITY[G] //LNCS 2073,Proc of the Int Conf on Computational Science.Berlin,Springer,2001,127-136.
4Im Eun Jin,Yelick Katherine,Vudue Wilson.Sparsity,Optimization framework for fparse matrix kernels[J].International Journal of High Performance Computing Applications,2004,18(1):135-158.
5Vuduc Wilson,Demmel James,Yelick Katherine,et al.Performance optimizarions and bounds for sparse matrixvector multiply[C] //Proc of Supercomputing.Los Alamitos,CA:IEEE Computer Society,2002= 1-35.
6Vuduc Wilson,Demmel James,Bilmes Jeff.Statistical models for empirical search-based performance tuning[J].International Journal of High Performance Computing Applications,2004,18(1):65-94.
7Demmel James,Yelick Katherine.Berkeley Benchmarking and OPtimization Project[OL].2006 [2007-11-20],http:// bebop.cs.berkeley.edu/.
8Voduc Wilson,Demmel James,Yelick Katherine.OSKI,A library of automatically tuned sparse matrix kernels[C] //Proc of SciDAC 2005:Journal of Physics,Conference Series.Philadelphia,PA:IOP,2005:521-530.
9Davis Tim.University of Florida sparse matrix collection[OL].2006[2007-11-20].http://www.else.ufl.edu/ research/sparse/matrices/.
10张云泉.面向数值计算的并行计算模型DRAM(h.k)[C]//863计划智能计算机主题学术会议论文集:智能计算机研究进展.北京,清华大学出版社,2001:218-225.

共引文献14

1孙相征,张云泉,王宣强,王磊.数值软件自适应性能优化搜索过程评价技术研究[J].计算机研究与发展,2010,47(4):679-686. 被引量：2
2陆平静,李宝,车永刚,庞征斌.一种基于代码隔离的大程序迭代编译优化方法[J].上海交通大学学报,2013,47(1):133-137.
3许彬彬,戴清平,朱敏,谢端强.基于哈夫曼编码的稀疏矩阵的存储与计算[J].计算机工程与科学,2013,35(11):134-138. 被引量：3
4罗红兵,张晓霞,王伟,武林平.科学计算应用程序单核指令级优化研究[J].计算机研究与发展,2014,51(6):1263-1269. 被引量：4
5阳王东,李肯立,石林.一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现[J].计算机科学,2014,41(7):290-296. 被引量：5
6刘芳芳,杨超.一种提高SpMV向量化性能的新型稀疏矩阵存储格式[J].数值计算与计算机应用,2014,35(4):269-276. 被引量：4
7陆平静,李宝,易任娇,张英,王绍刚,庞征斌.一种基于改进模拟退火算法的程序性能优化参数搜索算法[J].计算机工程与科学,2015,37(7):1227-1232. 被引量：5
8陶袁,祝明发.多核及众核体系结构下线性代数算法研究进展[J].吉林师范大学学报（自然科学版）,2015,36(3):32-40.
9黄敏,丁萍,罗海飚.共轭梯度法在GPU及Xeon Phi下的并行优化及比较[J].华南理工大学学报（自然科学版）,2015,43(11):35-46. 被引量：1
10阳王东,李肯立.基于HYB格式稀疏矩阵与向量乘在CPU+GPU异构系统中的实现与优化[J].计算机工程与科学,2016,38(2):202-209. 被引量：7

同被引文献9

1冯颖,袁庆华,沈健炜.基于CPU+GPU异构计算的编程方法研究[J].通信技术,2011,44(2):141-143. 被引量：10
2蔡镇河,张旭,栾江霞.CPU+GPU异构模式下并行计算效率研究[J].计算机与现代化,2012(5):185-188. 被引量：5
3李敏.高性能并行计算机的发展及其在石油勘探中的应用[J].科技导报,2014,32(2):80-83. 被引量：7
4石林.关于CPU+GPU异构计算模式程序开发中编程方法研究[J].科学大众（智慧教育）,2014(10):149-149. 被引量：1
5刘乐妍,蒋明礼.移动网时代微型计算机智能系统研究[J].湖北函授大学学报,2015,28(19):135-136. 被引量：1
6潘红芳,张瑜.智能分级存储系统在内蒙古电力的应用[J].电力信息与通信技术,2015,13(11):118-122. 被引量：2
7阳王东,李肯立.基于HYB格式稀疏矩阵与向量乘在CPU+GPU异构系统中的实现与优化[J].计算机工程与科学,2016,38(2):202-209. 被引量：7
8孟晨,曹宗雁,王龙,迟学斌.基于Charm++运行时环境的异构计算应用容错研究[J].计算机工程与应用,2016,52(13):1-7. 被引量：3
9钟丽.基于云计算的数据库智能存储系统设计[J].电子技术与软件工程,2016(17):171-171. 被引量：2

引证文献2

1蒋丽娟.一种智能存储系统在微机中的应用[J].电子技术与软件工程,2017(13):175-175.
2王永胜.CPU+GPU的异构计算系统在石油勘探中的应用研究[J].电脑知识与技术（过刊）,2017,23(10X):250-251. 被引量：1

二级引证文献1

1邱浩淼.基于CPU+GPU异构计算编程研究[J].科学技术创新,2020(1):74-75.

1阳王东,李肯立,石林.一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现[J].计算机科学,2014,41(7):290-296. 被引量：5
2龚和林,吴连发,舒情.一种基于高维矩阵变换的混沌图像加密[J].微计算机信息,2010,26(9):87-89.
3丁鑫,陈榕,陈海波.分布式图计算框架混合计算模式的研究[J].小型微型计算机系统,2015,36(4):665-670. 被引量：1
4胡俊,胡贤德,程家兴.基于Spark的大数据混合计算模型[J].计算机系统应用,2015,24(4):214-218. 被引量：56
5高倩,王慧勇,张悠慧,汪东升.基于瘦客户的混合计算设计与实现[J].小型微型计算机系统,2007,28(6):1134-1138. 被引量：3
6石强.遥感大数据研究现状与发展趋势[J].电光系统,2016,0(1):1-12. 被引量：3

小型微型计算机系统

2015年第7期

浏览历史

内容加载中请稍等...

准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化被引量：2

参考文献13

二级参考文献12

共引文献14

同被引文献9

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化 被引量：2

参考文献13

二级参考文献12

共引文献14

同被引文献9

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化被引量：2