期刊文献+

基于GPU架构的两层并行块Jacobi SVD算法 被引量:1

A PARALLEL TWO-TIER BLOCKED JACOBISVD ALGORITHM ON GPU
原文传递
导出
摘要 SVD(singularvaluedecomposition)广泛应用于图像处理、人脸识别、信号降噪等领域。本文基于单边JacobiSVD算法给出了块间和块内两层并行的块JacobiSVDGPU算法.为了更好地利用GPU的共享内存,块间并行通过存储矩阵列块之间的内积解决了共享内存不足的问题.此外,块间并行还通过矩阵块操作技术提高数据利用率及数据预取技术实现数据访问和数据计算的重叠.块内并行通过直接更新矩阵列块之间的内积替代了更新矩阵列块以及更新矩阵列块之后计算矩阵列块之间内积的归约操作,增加了GPU线程的利用率.另一方面,块内并行将需要多次访问的数据存储于共享内存或寄存器,减少了对全局内存的访问从而提升了算法实现性能。在NVIDIATeslaV100GPU上的数值实验结果表明,本文的算法较Cusolver库有1.8×倍的加速,较MAGMA库中最快的算法加速达2.5×倍. SVD(singular value decomposition) is wildly used in image processing,face recognition,signal processing,and other fields.In this paper,a parallel two-tier blocked Jacobi SVD GPU algorithm based on the one-sided Jacobi SVD algorithm and its effective implementation is presented.The parallel two-tier algorithm is composed of an inter-block parallel level and an intra-block parallel level.In the inter-block parallel level,the problem that the shared memory is too small to hold the matrix panels is overcome by storing the inner product of matrix panels on the shared memory instead.Besides,the matrix computation makes full use of the block operation technique to improve data reuse and the data prefetching technique to overlap the time of loading data and computing data.In the inner-block parallel level,for increasing the utilization of GPU threads,the computation of the inner product of matrix columns is avoided by updating the inner product of matrix columns parallelly.By storing data that can be reused many times on the shared memory or register files,the iterative process of intra-block parallelism level can reduce the access of the global memory,which improves the performance of our implementation.Numerical experiments on an NVIDIA Tesla V100 GPU show that the implementation of this paper is 1.8×and 2.5×times faster than the Cusolver and MAGMA libraries respectively.
作者 黄荣锋 赵永华 于天禹 刘世芳 Huang Rongfeng;Zhao Yonghua;Yu Tianyu;Liu Shifang(Computer Netuork Information Center,Chinese Academy of Sciences,Beijing100190,China;Universityof Chinese Academy of Sciences,Beijing100049,China)
出处 《数值计算与计算机应用》 2022年第4期380-399,共20页 Journal on Numerical Methods and Computer Applications
基金 国家重点研发计划(2017YFB0202202) 中国科学院战略性先导科技专项(XDC05000000)
关键词 奇异值分解 块Jacobi算法 并行算法 GPU 数据预取技术 singular value decomposition blocked Jacobi algorithm parallel algorithm GPU data prefetching
  • 相关文献

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部