摘要
稀疏矩阵向量乘(SpMV)是科学与工程计算中一个重要的核心函数,但在当前基于存储器层次结构的计算平台上,传统CSR(Compressed Sparse Row)存储的稀疏矩阵向量乘性能较低,运行效率往往远低于硬件浮点峰值的10%.目前现有的处理器架构一般都采用SIMD向量化技术进行加速,但是传统CSR格式的稀疏矩阵向量乘由于访存的不规则性,不能直接采用向量化技术进行加速,为了利用SIMD技术,对具有局部性特征的稀疏矩阵,提出了新的稀疏矩阵存储格式CSRL(Compressed Sparse Row with Local information),该格式可以减少SpMV时内存访问次数,并且能够充分利用硬件的SIMD向量化技术进行读取和计算,提高了SpMV性能.实验表明,该方法相比国际著名商业库Intel MKL10.3版平均性能提升达到29.5%,最高可达89%的性能提升.
Sparse matrix-vector multiplication (SpMV) is an important computational kernel in scientific and engineering applications. The performance of SpMV by using traditional CSR format is often far below 10% of the peak performance on modern processors with memory hierarchy. When using tile CSR format for SpMV, it is often hard to directly take advantage of the SIMD acceleration technology on mordern processors, due to irregular memory access pattern. In order to use the SIMD technology, a new storage format for sparse matrices, CSRL (Compressed Sparse Row with Local information), is proposed.The CSRL format has locality characteristic, and is SIMD-friendly. The new format reduces the nun, her of memory access and improves the SpMV performance. Experiments show that, compared with the implementation in Intel MKL library (version 10.3), the SpMV based on the CSRL format gains an average of 29.5% and maximum of 89%performance improvement.
出处
《数值计算与计算机应用》
CSCD
2014年第4期269-276,共8页
Journal on Numerical Methods and Computer Applications
基金
国家自然科学基金项目(61170075
91130023)
国家973项目2011CB309701
国家863项目2012AA010903资助