期刊文献+

基于CUDA的稀疏矩阵与矢量乘法的优化 被引量:6

Optimizations on Sparse Matrix-Vector Multiplication Based on CUDA
下载PDF
导出
摘要 随着VLSI技术的发展,在单芯片上集成若干个处理器核的思想成为现实,现代GPU就是一个典型的多核处理器设备;由于面向计算密集型的应用发展非常迅速,当前的GPU又具有了较强的通用计算能力;全文首先介绍了CUDA和稀疏矩阵的相关知识;基于矩阵的CSR表示格式,文章提出了三种CUDA模型下的程序优化方法;论文分析并实现了这三种程序优化方法,在Geforce 9600GT上的实验结果表明,最大可以实现4倍左右的加速比。 With the development of VLSI technology, the idea of integrating multiple cores become realistic. Modern GPU is just a typical multi--core device. Because of the rapid evolution of computation--intensive application, the current GPU has the capability to complete the general computation. This paper first introduce the knowledge of CUDA and Sparse Matrix. Based on the CSR format of sparse matrix, three optimization methods of programme are presented under the CUDA model on the paper. They are all analyzed and implemented. Experiment is done on the Geforee 9600GT, and the final result shows that almost 4x speedup was achieved in contrast with the CPU computing.
出处 《计算机测量与控制》 CSCD 北大核心 2010年第8期1906-1908,1912,共4页 Computer Measurement &Control
基金 国家"863"基金项目(2009AA01Z110)
关键词 CUDA GPGPU CSR 并行计算 稀疏矩阵与矢量相乘 CUDA GPGPU CSR parallel computation sparse matrix--vector multiplication.
  • 相关文献

参考文献6

  • 1杨志义,朱娅婷,蒲勇.基于统一计算设备架构技术的并行图像处理研究[J].计算机测量与控制,2009,17(4):734-737. 被引量:7
  • 2Garland M. Sparse Matrix Computations on Manycore GPU's [R]. DAC, June. 2008.
  • 3Bell N, Garland M, Efficient Sparse Matrix--Vector Multiplication on CUDA [R]. NVIDIA Technical Report NVR- 2008 -004, Dec. 2008.
  • 4NVIDIA CUDA Programming Guide Version 2. 1[EB], http: // www. nvidia. com, 2008.
  • 5Baskaran M M, Bordawekar R. Optimizing Sparse Matrix--vector Multiplication on GPUs [R]. IBM Research Report, April. 2009.
  • 6NVIDIA CUDA C Programming Best Practices Guide [EB], http://www. nvidia. com, July. 2009.

二级参考文献5

  • 1吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 2NVIDIA Corporation.CUDA ProgrammingGuide 1.0[EB/OL].http://www.nvidia.com,2007.
  • 3Tom R.Halfhil.Parallel Processing With CUDA.Microprocessor Report JR],Scottsdale,Arizona,Jan 28,2008.
  • 4Shubhabrata Sengupta,Mark Harris,Yao Zhang,and John D.Owens.Scan Primitives for GPU Computing[A].Graphics Hardware 2007[C],San Diego,California,August 04-05,2007.
  • 5John D.Owens,Mike Houston,David Luebke,et al.GPU Computing[J].Proceedings of the IEEE,2008 96 (5):879-897.

共引文献6

同被引文献31

  • 1龙爱芳.避免二阶导数计算的迭代法[J].浙江工业大学学报,2005,33(5):602-604. 被引量:2
  • 2王厂文,张有正.正定矩阵和的行列式不等式[J].浙江工业大学学报,2006,34(3):351-354. 被引量:1
  • 3Jameson A, Schmidt W, Turkel E. Numerical solution of the euler e- quations by finite volume methods ssing runge-kutta time-stepping schemes. AIAA - 1981-1259. 1981.
  • 4Brandvik T, Pullan G. Acceleration of 3D Euler solver using commod- ity graphics harware. AIAA-2008--607. 2008.
  • 5NVIDIA. Cuda C best practices guide v4. 1. January 2012.
  • 6Manolopoulos K,Reisis D,Chouliaras V A. An efficient multipleprecision floating-point multiplier [AJ. Electronics,Circuits andSystems(ICECS) [C]. 2011.
  • 7Gong Renxi,Zhang Shangjun,Zhang Hainan. Hardware implemen-tation of a High Speed Floating Point Multiplier Based on FPGA[A]. Proceedings of 2009 4th International Conference on ComputerScience Education [C]. 2009.
  • 8Venishetti S K,Akoglu A. A Highly Parallel FPGA based IEEE-754 Compliant Double-Precision Binary Floating-Point Multipli-cation Algorithm [ A ]. Field-Programmable Technology[C]. 2007.
  • 9Folkert B, Rob H B, Henk A D. Accelerating a barotropic ocean model using a GPU[J]. Ocean Modelling, 2012, 41: 16-21.
  • 10Jochen K. Advanced Ocean Modelling: Using Open-Source Soft- ware [M]. Berlin: Germany Springer, 2010: 21-35.

引证文献6

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部