摘要
稀疏矩阵与向量乘(SpMV)属于科学计算和工程应用中的一种基本运算,其高性能实现与优化是计算科学的研究热点之一。在微分方程的求解过程中会产生大规模的稀疏矩阵,而且很大一部分是一种准对角矩阵。针对准对角矩阵存在的一些不规则性,提出一种混合对角存储(DIA)和行压缩存储(CSR)格式来进行SpMV计算,对于分割出来的对角线区域之外的离散非零元素采用CSR存储,这样能够克服DIA在不规则情况下存储矩阵的列迅速增加的缺陷,同时对角线采用DIA存储又能充分利用矩阵的对角特征,以减少CSR的行非零元素数目的不均衡现象,并可以通过调整存储对角线的带宽来适应准对角矩阵的不同的离散形式,以获得比DIA和CSR更高的压缩比,减小计算的数据规模。利用CUDA平台在GPU上进行了实验测试,结果表明该方法比DIA和CSR具有更高的加速比。
Sparse matrix-vector multiplication(SpMV) is of singular importance in sparse linear algebra,which is an im- portant issue in scientific computing and engineering practice. Much effort has been put into accelerating the SpMV and a few parallel solutions have been proposed. In this paper we focused on a special SpMV, sparse quasi-diagonal matrix multiplication(SQDMV). The sparse quasi diagonal matrix is the key to solve many differential equation and very little research is done on this field. We discussed data structures and algorithms for SQDMV that were efficiently implemen- ted on the CUDA platform for the fine-grained parallel architecture of the GPU. We presented a new diagonal storage format HDC, which overcomes the inefficiency of DIA in storing irregular matrix and the imbalances of CSR in storing non-zero element. Further, HI)C can adjust the storage bandwidth of the diagonal to adapt to different discrete degree of sparse matrix, so as to get higher compression ratio than the DIA and CSR, reduce the computation complexity. Our im- plementation in GPU shows that the performance of HDC is better than other format especially for matrix with some discrete points outside the main diagonal. In addition, we combined the different parts of HDC to a unified kernel to get better compress ration and higher speedup ratio in GPU.
出处
《计算机科学》
CSCD
北大核心
2014年第7期290-296,共7页
Computer Science
基金
国家自然科学基金重点项目(61133005)
国家自然基金项目(61070057)
国家科技支撑计划项目(2012BAH09B02)
教育部科技创新工程重大项目培育资金项目(708066)
教育部博士点基金(20100161110019)
教育部新世纪优秀人才支持计划(NCET-08-0177)
湖南省教育厅重点科研项目(13A011)资助