期刊文献+

一种面向OpenCL架构的矩阵-向量乘并行算法与实现 被引量:2

Matrix-vector Multiplication Parallel Algorithm and Implementation for OpenCL Architecture
下载PDF
导出
摘要 矩阵-向量乘法算法的时间复杂度大,传统计算方法的实时性和跨平台性难以保证.本文提出一种基于开放式计算语言(Open Computing Language,OpenCL)的矩阵-向量乘并行算法,矩阵-向量乘法过程被分解成若干具有不同粒度的子任务.根据相应的并行度,每个工作组进行矩阵中的行块与列向量的乘积,每个工作项进行行块中行向量与列向量的乘积,并把计算任务分别分配到计算单元和处理单元进行处理.实验结果表明,与基于CPU的串行算法、基于OpenMP并行算法和基于统一计算设备架构(Compute Unified Device Architecture,CUDA)并行算法性能相比,矩阵-向量乘并行算法在OpenCL架构下NVIDIA图形处理器(Graphic Processing Unit,GPU)计算平台上分别获得了20. 86倍、6. 39倍和1. 49倍的加速比.验证了提出的并行优化方法的有效性和性能可移植性. The time complexity of matrix-vector multiplication algorithm is large,and the real-time and cross-platform performance of traditional computing methods is difficult to guarantee. This paper presents a matrix-vector multiplication parallel algorithm based on Open Computing Language( OpenCL),and the matrix-vector multiplication process is decomposed into several subtasks with different granularity. According to the corresponding degree of parallelism,each work-group carries on the product of the rowblock in the matrix and the column vector,each work-item carries on the product of the rowvector in the rowblock and the column vector,and assigns the computation task separately to the compute unit and the processing element for processing. The experimental results showthat compared with the performance of the serial algorithm based on CPU,parallel algorithm based on OpenMP and parallel algorithm based on Compute Unified Device Architecture( CUDA),the matrix-vector multiplication parallel algorithm obtains 20. 86 times,6. 39 times and 1. 49 times speedup in the NVIDIA GPU computing platform under the OpenCL architecture respectively. The validity and performance portability of the proposed parallel optimization method are verified.
作者 肖汉 周清雷 姚鹏姿 XIAO Han;ZHOU Qing-lei;YAO Peng-zi(School of Information Science and Technology,Zhengzhou Normal University,Zhengzhou 450044,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第1期26-30,共5页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61572444 61250007)资助
关键词 矩阵-向量乘 图形处理器 开放式计算语言 并行算法 matrix-vector multiplication GPU OpenCL parallel algorithm
  • 相关文献

参考文献6

二级参考文献20

  • 1邓松,何开成,韩文报.用GF(q)上的块Wiedemann算法求解非齐次稀疏线性方程组[J].信息工程大学学报,2007,8(3):294-297. 被引量:2
  • 2Clive“Max”Maxfield.FPGA设计指南:器件、工具和流程[M].杜生海,邢闻,译.北京:人民邮电出版社,2007.
  • 3Morris G. Mapping sparse matrix scientific applica- tions onto FPGA augmented recon- figurable super- computers[D]. California: University of Southern Cali- fornia, 2006.
  • 4Zhuo L, Prasanna V K. Scalable and modular algo- rithms for floating point matrix multiplication on reconfigurable computing systems[J]. IEEE Transac- tions on Parallel and Distributed Systems, 2007,18(4) : 433-448.
  • 5Zhuo L, Prasanna V K. Scalable hybrid designs for inear algebra on reconfigurable computing systems [J]. IEEE Transactions on Computers, 2008,57 (12) : 1661-1975.
  • 6Scrofano R, Zhuo L. Area-efficient arithmetic ex- pression evaluation using deeply pipelined floating- Joint cores[J]. IEEE Transactions on Very Large cale Integration (VLSI) Systems, 2008, 16 (2) : 167- 176.
  • 7Baboulin M, Buttari A. Accelerating scientific compu- tations with mixed precision algorithms[J]. Computer Physics Communications, 2009,180(12) : 2526-2533.
  • 8Dorrance R,Ren F,Markovic D. A Scalable Sparse Matrix-Vec-tor Multiplication Kernel for Energy-Efficient Sparse-BLAS onFPGAs[C] // Proceedings of the 2014 ACM/SIGDA Interna-tional Symposium on Field-Programmable Gate Arrays C FP-GA). ACM,2014:16M70.
  • 9Fowers J, Ovtcharov K.Strauss K,et al. A High Memory Band-width FPGA Accelerator for Sparse Matrix-Vector Multiplica-tion[C] // Proceedings of the 2014 IEEE 22nd Annual Interna-tional Symposium on Field-Programmable Custom ComputingMachines (FCCM). IEEE, 2014: 36-43.
  • 10Zhuo L,Prasanna V K. Sparse Matrix-Vector Multiplication onFPGAs[C]//Proceedings of the 13th ACM/SIGDA Internation-al Symposium on Field Programmable Gate Arrays (FPGA).ACM, 2005:63-74.

共引文献4

同被引文献11

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部