摘要
稀疏矩阵向量乘(Sp MV)是线性求解系统等科学计算中重要的计算核心.鉴于传统的稀疏矩阵向量乘算法在Intel Xeon Phi众核集成架构上存在SIM D利用率低,不规则访存开销高及负载不均衡的问题,难以发挥其运算能力.本文针对Intel Xeon Phi的体系结构特点,提出了一种通用的分块压缩存储表示的稀疏矩阵向量乘并行算法:(1)在ELLPACK存储格式基础上按列分块及压缩矩阵,增加非零元的密度,提高SIMD利用率;(2)通过精心的数据重排,保留矩阵非零元本身的局部性,从而提高数据重用率,降低访存开销;(3)将矩阵压缩后划分成近似等大的矩阵块并静态等量分配到不同核上计算,使各核负载均衡.实验结果表明,与Intel Xeon Phi上已有的MKL数学库中的CSR算法相比,本算法获得了更高的计算访存比,性能比M KL的CSR算法平均快2.05倍.
Sparse matrix-vector multiplication( Sp M V) plays a key role in many scientific computing applications such as linear solver systems.Traditional Sp M V algorithms meet lowSIM D utilization,high overhead of irregular memory access and load imbalance on Intel Xeon Phi.This paper introduces a newparallel Sp M V algorithm with a newblocked compressing format,which focused on main architecture features of Intel Xeon Phi.These methods include(1) Column blocking and compressing based on ELLPACK format to improve the density of non-zeros,to improve SIM D utilization;(2) With carefully data re-organization,retain the locality of matrix non-zeros,and improve data reuse ratio,to decrease memory access overhead;(3) After compressing,partition the matrix into equal size of matrix blocks,and distribute them to different cores to make them load balance.Furthermore,we demonstrate that compared with the CSR kernel of Intel M KL,our implementation on Intel Xeon Phi achieves higher ratio of the number of floating point operations to the number of memory accesses,and our implementation is 2.05 faster than the CSR kernel of M KL on average.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第4期818-823,共6页
Journal of Chinese Computer Systems
基金
国家"八六三"高技术研究发展计划项目(2012AA010901
2012AA010902)资助