摘要
在图形处理器(GPU)上实现对角稀疏矩阵向量乘法(SpMV)可以充分利用GPU的并行计算能力,并加速矩阵向量乘法;然而,相关主流算法存在零元填充数据多、计算效率低的问题。针对上述问题,提出一种对角SpMV算法DIA-Dynamic(DIAgonal-Dynamic)。首先,设计一种全新的动态划分策略,根据矩阵的不同特征进行分块,在保证GPU高计算效率的同时大幅减少零元填充,去除冗余计算量;其次,提出一种对角稀疏矩阵存储格式BDIA(Block DIAgonal)存储分块数据,并调整数据布局,提高GPU上的访存性能;最后,基于GPU的底层进行条件分支优化,以减少分支判断,并使用动态共享内存解决向量的不规则访问问题。DIA-Dynamic与前沿Tile SpMV算法相比,平均加速比达到了1.88;与前沿BRCSD(Diagonal Compressed Storage based on Row-Blocks)-Ⅱ算法相比,平均零元填充减少了43%,平均加速比达到了1.70。实验结果表明,DIA-Dynamic能够有效提高GPU上对角SpMV的计算效率,缩短计算时间,提升程序性能。
Implementing diagonal Sparse Matrix Vector multiplication(SpMV)on Graphics Processing Unit(GPU)can make full use of the parallel computing capabilities of GPU and accelerate matrix vector multiplication.However,related mainstream algorithms have problems such as a large amount of zero-element filling data and low computational efficiency.In response to the above problems,a diagonal SpMV algorithm DIA-Dynamic(Diagonal-Dynamic)was proposed.Firstly,a new dynamic partition strategy was designed to divide the matrix into blocks according to different characteristics,which greatly reduced the zero-element filling while ensuring high computational efficiency of GPU,thereby removing redundant calculations.Then,a diagonal sparse matrix storage format BDIA(Block DIAgonal)was proposed to store block data,and the data layout was adjusted to improve memory access performance on GPU.Finally,based on the bottom of GPU,the conditional branch optimization was performed to reduce branch judgments,and dynamic shared memory was used to solve the problem of irregular access of vectors.Compared with the state-of-the-art Tile SpMV algorithm,DIA-Dynamic has the average acceleration ratio of 1.88;compared with the cutting-edge BRCSD(Diagonal Compressed Storage based on Row-Blocks)-Ⅱalgorithm,DIA-Dynamic has the average zero-element filling reduced by 43%,and the average acceleration ratio reaches 1.70.Experimental results show that DIA-Dynamic can effectively improve the computational efficiency of diagonal SpMV on GPU,shorten the computing time,improving the program performance.
作者
涂进兴
李志雄
黄建强
TU Jinxing;LI Zhixiong;HUANG Jianqiang(School of Computer Science and Technology,Qinghai University,Xining Qinghai 810016,China;Intelligent Computing and Application Laboratory of Qinghai Province(Qinghai University),Xining Qinghai 810016,China)
出处
《计算机应用》
CSCD
北大核心
2024年第11期3521-3529,共9页
journal of Computer Applications
基金
国家自然科学基金资助项目(62062059)
青海省科技计划项目(2022-ZJ-701)。
关键词
图形处理器
对角稀疏矩阵
稀疏矩阵向量乘法
动态划分
共享内存
Graphics Processing Unit(GPU)
diagonal sparse matrix
Sparse Matrix Vector multiplication(SpMV)
dynamic partition
shared memory