期刊文献+

基于GPU的并行计算性能分析模型 被引量:3

Parallel Computation Performance Analysis Model Based on GPU
下载PDF
导出
摘要 针对GPU并行计算领域缺少精确的性能分析模型和有针对性的性能优化方法,提出一种基于GPU的并行计算性能定量分析模型,其通过对指令流水线、共享存储器访存、全局存储器访存的性能建模,来定量分析并行程序,帮助程序员找到程序运行瓶颈,进行有效的性能优化。实验部分通过3个具有代表性的实际应用(稠密矩阵乘法、三对角线性方程组求解、稀疏矩阵矢量乘法)的性能分析证明了该模型的实用性,并有效地实现了算法的优化。 In order to solve the problem of lacking accurate performance analysis model in parallel computation field based on GPU,we proposed a quantitative performance model which can simulate the performance of three major com- ponents of GPU including instruction pipeline, shared memory access time, and global memory access time. It is designed to build a performance model that helps programmer find the performance bottlenecks and improve the system's per-formance efficiently. To demonstrate the usefulness of the model and to optimize the algorithms performance, we ana- lyzed three representative real-world programs: dense matrix multiplication, tridiagonal systems solver, and sparse ma- trix vector multiplication.
出处 《计算机科学》 CSCD 北大核心 2014年第1期31-38,共8页 Computer Science
基金 广州市科技项目(2012Y2-0031) 博士后基金(2013M531825) 国家自然科学基金(U1201251)资助
关键词 GPU 性能定量分析模型 指令流水线 共享存储器访存 全局存储器访存 GPU, Quantitative performance model, Instruction pipeline, Shared memory access time, Global memory ac- cess time
  • 相关文献

参考文献15

  • 1Profiler A S. ATI Stream Profller[OL]. http://developer, amd. com.
  • 2Collange S, et al. Barra: A Parallel Functional Simulator for GPGPU[C]//IEEE International Symposium on Modeling, A- nalysis b- Simulation of Computer and Telecommunication Sys- tems (MASCOTS). 2010.
  • 3Nsight N P. NVIDIA Parallel Nsight[OL]. http://developer. nvidia, com.
  • 4Diamos G F, et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems[C]// 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010. Vienna, Austria: Institute of Electrical and Electronics Engineers Inc, 2010.
  • 5Ryoo S, et al. Program optimization carving for GPU computing [J]. Journal of Parallel and Distributed Computing, 2008, 68 (10) : 1389-1401.
  • 6Liu Y,Zhang E Z, Shen X. A Cross-Input Adaptive Framework for GPU Program Optimizations [ C]//23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009. Rome, Italy; IEEE Computer Society, 2009.
  • 7Meng J, Skadron K. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs[C]//23rd International Conference on Supercomputing, ICS'09. Yorktown Heights, NY, United states; Association for Computing Machine- ry,2009.
  • 8Choi J W, Singh A, Vuduc R W. Model-driven autotuning of sparse matrix-vector multiply on GPUs[C]//2010 ACM SIGP- LAN Symposium on Principles and Practice of Parallel Program- ming, PPoPP' 10. Bangalore, India: Association for Computing Machinery, 2010.
  • 9Baskaran M M, et al. A compiler framework for optimization of affine loop nests for GPGPUs[C]//22nd ACM International.Conference on Supercomputing, ICS ' 08. Island of Kos, Greece Association for Computing Machinery, 2008.
  • 10Collange S, et al. Barra: A Parallel Functional Simulator for GPGPU. in Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS) [C]///2010 IEEE In- ternational Symposium on. 2010.

同被引文献6

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部