期刊文献+

基于GPU的并行优化技术 被引量:23

Parallel optimize technology based on GPU
下载PDF
导出
摘要 针对标准并行算法难以在图形处理器(GPU)上高效运行的问题,以累加和算法为例,基于Nvidia公司统一计算设备架构(CUDA)GPU介绍了指令优化、共享缓存冲突避免、解循环优化和线程过载优化四种优化方法。实验结果表明,并行优化能有效提高算法在GPU上的执行效率,优化后累加和算法的运算速度相比标准并行算法提高了约34倍,相比CPU串行实现提高了约70倍。 Standard parallel algorithm cannot work efficiently on GPU. This paper took reduction algorithm for example, introduced four parallel optima methods for NVIDIA' s graphics processor unit (GPU) which supported CUDA architecture. These methods included instruction optimize and shared memory conflict avoid and loop unroll and threads overload optimize. The experiment result shows that: parallel optimize can significantly speed up the GPU compute speed. The optimized reduction algorithm is 34 times faster than standard parallel algorithm and 70 times than CPU-based implementation.
出处 《计算机应用研究》 CSCD 北大核心 2009年第11期4115-4118,共4页 Application Research of Computers
基金 国家"863"高技术(保密)资助项目
关键词 图形处理器 并行优化 累加和 统一计算设备架构 graphics processor unit(GPU) parallel optimize reduction compute unified device architecture(CUDA)
  • 相关文献

参考文献11

  • 1NVIDIA. NVIDIA CUDA programming guide version 1.1 [ EB/OL]. (2007-01). http://www. nvidia. com/object/cuda_home, html.
  • 2HARADA T. Real-time rigid body simulation on GPUs [ M ]. [ S. l. ] : Addison Wesley Professional, 2007:611- 632.
  • 3NYLAND L, HARRIS M, PRINS J. Fast N-body simulation with CU- DA [ M ]. [ S. l. ] : Addison Wesley Professional, 2007:677- 696.
  • 4PODLOZHNYUK V, HARRIS M. Monte-Carlo option pricing[ EB/ OL]. (2007-11-21 ). http://www. nvidia. com/object/cuda_horne. html.
  • 5PODLOZHNYUK V. Black-scholes option pricing[ EB/OL]. (2007- 04-06). http://www. nvidia. com/object/euda_home. html.
  • 6DESCHIZEAUX B, BLANC J Y. Imaging earth' s subsurface using CUDA [ M ]. [ S. l. ] : Addison Wesley Professional, 2007:831 - 850.
  • 7HARISH P, NARAYANAN P J. Accelerating large graph algorithms on the GPU using CUDA[ C ]//Proc of IEEE International Conference on High Performance Computing. 2007 : 197- 208.
  • 8SHAMS R, BARNES N. Speeding up mutual information computation using NVIDIA CUDA hardware [ C ]//Proe of Digital Image Computing: Techniques and Applications. Adelaide, Australia: [ s. n. ], 2007:555- 560.
  • 9SHAMS R, KENNEDY R A. Efficient histogram algorithms for NVIDIA CUDA compatible devices [ C ]//Proc of International Conference on Signal Processing and Communications Systems, 2007: 418- 422.
  • 10HARRIS M. Optimizing parallel reduction in CUDA [ EB/OL]. (2007-11 ). http ://www. nvidia. com/object/cuda home. html.

同被引文献280

引证文献23

二级引证文献115

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部