期刊文献+

Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU

Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
下载PDF
导出
摘要 This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage. This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication (SpGEMM).It classifies the cache misses into two categories:one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage.
出处 《High Technology Letters》 EI CAS 2013年第4期339-345,共7页 高技术通讯(英文版)
基金 Supported by the National High Technology Research and Development Programme of China(No.2010AA012302,2009AA01 A134) Tsinghua National Laboratory for Information Science and Technology(TNList)Cross-discipline Foundation
关键词 高速缓存 性能优化 矩阵乘法 稀疏矩阵 GPU CPU 缓存替换算法 多核心 sparse matrix multiplication cache miss scalability multi-core CPU GPU
  • 相关文献

参考文献12

  • 1Cullum J K, Willoughby R A. Lanczos Algorithms forLarge Symmetric Eigenvalue Computations. Boston: Birkh auser, 1985. 252-268.
  • 2Rabin M O, Vazirani V V. Maximum matchings in gener- al graphs through randomization. Journal of Algorithms, 1989, 10(4) :557-567.
  • 3Yuster R, Zwick U. Detecting short directed cycles using rectangular matrix multiplication and dynamic program- ming. In: Proceedings of the 15th annual ACM-SIAM symposium on discrete algorithms, Philadelphia, USA, 2004. 254-260.
  • 4Zwick U. All pairs shortest paths using bridging sets and rectangular matrix multiplication. Journal of the ACM, 2002, 49(3) :289-317.
  • 5Gustavson F G. Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Transac- tions on Mathematical Software ( TOMS ), 1978, 4 ( 3 ) : 250-269.
  • 6Yuster R, Zwick U. Fast sparse matrix multiplication. ACM Transactions on Algorithms ( TALG) , 2005, 1 ( 1 ) : 2-13.
  • 7Gilbert J R, Moler C, Schreiber R. Sparse matrices in Matlab: design and implementation. SlAM Journal on Matrix Analysis and Applications, 1992, 13( 1 ) :333-356.
  • 8Buluc A, Gilbert J R. On the Representation and multi- plication of hypersparse matrices. In: 2008 IEEE Inter-national Symposium on Parallel and Distributed Process- ing. Miami, USA, 2008. 1-11.
  • 9Buluc A, Gilbert J R. Challenges and advances in paral- lel sparse matrix-matrix multiplication. In: Proceedings of the 37th International Conference on Parallel Process- ing, Washington DC, USA, 2008. 503-510.
  • 10Ryoo S, Rodrigues C I, Baghsorkhi S S, et al. Optimiza- tion principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. New York, USA, 2008.73-82.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部