期刊文献+

面向稀疏矩阵向量乘的DMA设计与验证

Design and Verification of DMA for Sparse Matrix-Vector Multiplication
下载PDF
导出
摘要 稀疏矩阵向量乘法(SpMV)是迭代法求解大型线性方程组的核心算法,被广泛应用在科研和工程中。高性能共轭梯度算法(HPCG)是评价高性能计算系统性能的测试程序之一,需要多次调用SpMV进行迭代计算。但是,SpMV计算过程中包含大量不规则访存操作,降低了系统计算性能。基于X-DSP项目,在DMA中设计一条面向SpMV的专用数据通道实现不规则访存的功能,提高HPCG算法运算速度。设计代码的验证与综合结果表明预期的功能实现正确,且满足项目对时序、面积和功耗的要求。 Sparse Matrix-Vector Multiplication(SpMV)is the core algorithm for solving large linear equations by iterative method,which is widely used in scientific research and engineering.The High Performance Conjugate Gradient(HPCG)algorithm is one of the test programs for evaluating the performance of high performance computing systems.It requires multiple calls to SpMV for iterative calculations.However,the SpMV calculation process contains a large number of irregular memory access operations,which reduces the system computing performance.Based on the X-DSP project,a dedicated data channel for SpMV is designed in the DMA to realize the irregular memory access function,and the HPCG algorithm operation speed is improved.Verification and synthesis of the design code indicate that the expected functionality is implemented correctly and it meets the project’s timing,area,and power requirements.
作者 曹亚松 刘胜 CAO Yasong;LIU Sheng(School of Computer Science,National University of Defense Technology,Changsha 410073)
出处 《计算机与数字工程》 2019年第11期2686-2690,共5页 Computer & Digital Engineering
关键词 稀疏矩阵向量乘法(SpMV) 直接内存存取(DMA) 压缩稀疏行(CSR) Sparse Matrix-Vector Multiplication(SpMV) Direct Memory Access(DMA) Compressed Sparse Row(CSR)
  • 相关文献

参考文献13

二级参考文献95

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:226
  • 2赵雪菲,么焕民.Laplace方程九点差分格式的构造及其误差估计[J].哈尔滨师范大学自然科学学报,2011,27(4):6-9. 被引量:4
  • 3张永杰,孙秦.稀疏矩阵存储技术[J].长春理工大学学报(自然科学版),2006,29(3):38-41. 被引量:14
  • 4Saad Y.Iterative methods for sparse linear systems[M].Society for Industrial Mathematics,2003.
  • 5Foley T,Houston M,Hanrahan P.Efficient partitioning of fragment shaders for multiple-output hardware[C] ∥Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Gra-phics Hardware.Grenoble,France,Eurographics Association,2004:45-53.
  • 6CUDPP:CUDA data parallel primitives library[OL].http://www.gpgpu.org/developer/cudpp/.
  • 7Bell N,Garland M.Efficient sparse matrix-vector multiplication on CUDA[R].NVIDIA Technical Report NVR-2008-004.Dec.2008.
  • 8Im E J,Yelick K A,Vuduc R.Sparsity:Framework for optimizing sparse matrix-vector multiply[J].International Journal of High Performance Computing Applications,2004,18(l):135-158.
  • 9Mellor C J,Garvin J.Optimizing sparse matrix-vector product computations using unroll and jam[J].International Journal of High Performance Computation Application,2004,18(2):225-236.
  • 10Nishtal A R,Vuduc R,Demmel J,et al.When cache blocking sparse matrix vector multiply works and why[J].Applicable Algebra in Engineering,Communications and Computing,2007,18(3):297-311.

共引文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部