As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo a...As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices.展开更多
在图形处理器(GPU)上实现对角稀疏矩阵向量乘法(SpMV)可以充分利用GPU的并行计算能力,并加速矩阵向量乘法;然而,相关主流算法存在零元填充数据多、计算效率低的问题。针对上述问题,提出一种对角SpMV算法DIA-Dynamic(DIAgonal-Dynamic)...在图形处理器(GPU)上实现对角稀疏矩阵向量乘法(SpMV)可以充分利用GPU的并行计算能力,并加速矩阵向量乘法;然而,相关主流算法存在零元填充数据多、计算效率低的问题。针对上述问题,提出一种对角SpMV算法DIA-Dynamic(DIAgonal-Dynamic)。首先,设计一种全新的动态划分策略,根据矩阵的不同特征进行分块,在保证GPU高计算效率的同时大幅减少零元填充,去除冗余计算量;其次,提出一种对角稀疏矩阵存储格式BDIA(Block DIAgonal)存储分块数据,并调整数据布局,提高GPU上的访存性能;最后,基于GPU的底层进行条件分支优化,以减少分支判断,并使用动态共享内存解决向量的不规则访问问题。DIA-Dynamic与前沿Tile SpMV算法相比,平均加速比达到了1.88;与前沿BRCSD(Diagonal Compressed Storage based on Row-Blocks)-Ⅱ算法相比,平均零元填充减少了43%,平均加速比达到了1.70。实验结果表明,DIA-Dynamic能够有效提高GPU上对角SpMV的计算效率,缩短计算时间,提升程序性能。展开更多
矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图...矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。展开更多
文摘As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices.
文摘在图形处理器(GPU)上实现对角稀疏矩阵向量乘法(SpMV)可以充分利用GPU的并行计算能力,并加速矩阵向量乘法;然而,相关主流算法存在零元填充数据多、计算效率低的问题。针对上述问题,提出一种对角SpMV算法DIA-Dynamic(DIAgonal-Dynamic)。首先,设计一种全新的动态划分策略,根据矩阵的不同特征进行分块,在保证GPU高计算效率的同时大幅减少零元填充,去除冗余计算量;其次,提出一种对角稀疏矩阵存储格式BDIA(Block DIAgonal)存储分块数据,并调整数据布局,提高GPU上的访存性能;最后,基于GPU的底层进行条件分支优化,以减少分支判断,并使用动态共享内存解决向量的不规则访问问题。DIA-Dynamic与前沿Tile SpMV算法相比,平均加速比达到了1.88;与前沿BRCSD(Diagonal Compressed Storage based on Row-Blocks)-Ⅱ算法相比,平均零元填充减少了43%,平均加速比达到了1.70。实验结果表明,DIA-Dynamic能够有效提高GPU上对角SpMV的计算效率,缩短计算时间,提升程序性能。
文摘矩阵主特征向量(principal eigenvectors computing,PEC)的求解是科学与工程计算中的一个重要问题。随着图形处理单元通用计算(general-purpose computing on graphics pro cessing unit,GPGPU)的兴起,利用GPU来优化大规模稀疏矩阵的图形处理单元求解得到了广泛关注。分别从应用特征和GPU体系结构特征两方面分析了PEC运算的性能瓶颈,提出了一种面向GPU的稀疏矩阵存储格式——GPU-ELL和一个针对GPU的线程优化映射策略,并设计了相应的PEC优化执行算法。在ATI HD Radeon5850上的实验结果表明,相对于传统CPU,该方案获得了最多200倍左右的加速,相对于已有GPU上的实现,也获得了2倍的加速。